cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
62 stars 266 forks source link

Write out cross-validation results immediately #2483

Closed LukasBeiske closed 6 months ago

LukasBeiske commented 6 months ago

Fixes #2480

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (75a38ee) 92.44% compared to head (19791c9) 92.47%. Report is 21 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2483 +/- ## ========================================== + Coverage 92.44% 92.47% +0.02% ========================================== Files 234 234 Lines 19917 19965 +48 ========================================== + Hits 18413 18462 +49 + Misses 1504 1503 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

LukasBeiske commented 6 months ago

Good start, but you should

a) Check overwrite conditions in __init__: if output file exists and not overwrite, raise an error, if output file exists and overwrite=true truncate it (tables.open_file(path, mode='w').

b) Move the writing into the cv loop using write_table(..., append=True) to further reduce memory usage

c) Add writing of metadata to the output file

a) The error if file exists and not overwrite is already raised by the train tools via Tool.check_output.

a)/b) Using write_table(..., append=True, mode='w') does not work, does it? Isn't this basically the same as write_table(..., append=True, overwrite=True)?

c) Is there any other metadata requiered besides some column descriptions?

maxnoe commented 6 months ago

a) The error if file exists and not overwrite is already raised by the train tools via Tool.check_output.

Also for the cv output file and not just the model?

maxnoe commented 6 months ago

a)/b) Using write_table(..., append=True, mode='w') does not work, does it? Isn't this basically the same as write_table(..., append=True, overwrite=True)?

You need to truncate the file once in __init__ and then only use append=True

LukasBeiske commented 6 months ago

a) The error if file exists and not overwrite is already raised by the train tools via Tool.check_output.

Also for the cv output file and not just the model?

Yes, both output paths are passed and CrossValidator.ouput_path gets ignored if it is None