Fix evaluation - Githubissues

ma595 commented 1 week ago

This fixes the current visualisation (step 5) regression for DEF_Trunk. I have cherry-picked a commit from @tztsai and made a few modifications. This should be committed along with #45

Once merged, SPINacc can be run end to end, and is ready for @dsgoll123 to check.

dsgoll123 commented 5 days ago

I successfully ran task 1-5. The Eval.png look plausible. I did a test with increased number of training data the Eval.png show improved ML predictions - as expected.

Two problems I encountered which are not directly related to the code modifications here

(1) README.md: The following action is not possible as file doesn't exist: ‘In tests/config.py you have to modify: test_path=/your/path/to/SPINacc/EXE_DIR/’ Should it be this file instead? DEF_Trunk/config.py

(2) Running task 1-4 w/o any error message in log. I cannot locate the test results file listed in the log file ( I looked in EXE_DIR as specified as the last item in 'steps' of README). Entry from log file: Task 1 reproducibility test results have been stored in tests_results.txt Task 2 reproducibility test results have been stored in tests_results.txt Task 3 reproducibility test results have been stored in tests_results.txt

ma595 commented 4 days ago

@dsgoll123 In DEF_Trunk/config.py test_path should point to your output directory, so wherever EXE_DIR is.

Why does this relate to README.md? I think the README.md needs a significant rewrite, which I'll be happy to attempt.

For (2), it looks like we're no longer outputting to tests_results.txt so this message is no longer valid. We have two options, we reproduce exactly what was there before (I'll checkout the old code to verify exactly what was in there. I suspect it was just a success or fail.) or delete this entirely.

ma595 commented 4 days ago

I have looked at the previous version of tests_results.txt and it's a little more than I first thought.

Reproducibility test for task 1 checks the element wise absolute error of dist_all.npy vs the reference (a (8,14) dimension array) Reproducibility test for task 2 checks the contents of IDloc.npy, IDSel.npy and IDx.npy vs the reference. Reproducibility test for task 3 checks the contents of SRF_FGSPIN.10Y.ORC22v8034_19101231_sechiba_rest.nc vs the reference. (dimensions and variables) Reproducibility test for task 4 checks all *.png files, and contents of SBG_FGSPIN.340Y.ORC22v8034_22501231_stomate_rest.nc

Regarding the new implementation:

task 1 checks everything as before - but we don't output absolute error. task 2 checks everything as before task 4 checks the output to 2 s.f. vs reference of the [R2,slope,NRME].txt error. We do not check *.pngs or the stomate_rest.nc file. The latter seems easy to check.

It would be useful to put this in a tests_results.txt report as before, providing absolute errors and the contents as it was originally summarised.

ma595 commented 4 days ago

@dsgoll123 Since the issues raised above are not directly related to the PR. I'll merge this PR for now. Thanks for the review.

CALIPSO-project / SPINacc

Fix evaluation #84