Ephemeris output in selected format

astronomerritt commented 2 months ago

Fixes #987.

Sorcha now correctly writes HDF5 files by appending each chunk to the existing dataset within the HDF5 file. Previously, it was writing each chunk as a separate dataset within the HDF5 file, which is non-intuitive and could not be easily read by Sorcha's HDF5DataReader.

Fixes #679.

When the -ew flag is set, ephemeris output will save in the same format as Sorcha output, as specified by the eph_format keyword in the [OUTPUT] section of the config file.
Changed unit tests.
Added a new unit test to ensure that ephemeris files are correctly written AND can be read back in by Sorcha.
Changed wording in example config files to make it clear that the eph_format config file variable also affects format of outputted ephemeris files. Removed an outdated reference to the temporary ephemeris file.

NOTE 1: because the ephemeris output can now be saved in different formats, the file extension SHOULD NOT BE INCLUDED on the call to the command line:

-ew ephemeris_output

NOT:

-ew ephemeris_output.csv

NOTE 2: the format in which ephemeris output can be saved is controlled by the eph_format keyword in the config file. The only available options are thus csv, whitespace or hdf5.

Review Checklist for Source Code Changes

[x] Does pip install still work?
[x] Have you written a unit test for any new functions?
[x] Do all the units tests run successfully?
[x] Does Sorcha run successfully on a test set of input files/databases?
[x] Have you used black on the files you have updated to confirm python programming style guide enforcement?

mschwamb commented 2 months ago

@Gerenjie can you test this out?

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.

Project coverage is 80.89%. Comparing base (6423254) to head (9aae130). Report is 3 commits behind head on main.

Files	Patch %	Lines
src/sorcha/modules/PPOutput.py	88.88%	1 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #984 +/- ## ========================================== + Coverage 80.79% 80.89% +0.10% ========================================== Files 69 69 Lines 3083 3094 +11 ========================================== + Hits 2491 2503 +12 + Misses 592 591 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mjuric commented 2 months ago

For Sorcha 1.1 -- could we consider allowing the extension, and also inferring the format from the extension?

It would avoid some of these problems (and also make it easier to shell-script Sorcha runs). Was this already considered & rejected because of some other side-effects?

mschwamb commented 2 months ago

@mjuric Never discussed because OIF did that. White space has lots of various extensions. I don't see it how it helps with the shell script... happy to discuss it on the next planned sorcha call

astronomerritt commented 2 months ago

I think I should probably write a unit test to cover ephemeris files being written in different formats.

Also, should SQL-format ephemeris files also be indexed?

mschwamb commented 2 months ago

I think it might take way too long if its being indexed given the sheer amount of rows. It might be better as a utility script.

astronomerritt commented 2 months ago

Good point.

Would strongly advise this isn't merged until a unit test is in, even if Jake finishes testing it first - I've just found (and fixed) a small bug I introduced when fixing the merge conflict. Writing the unit test now.

mschwamb commented 2 months ago

Unit tests would be good.... 😀

astronomerritt commented 2 months ago

Unit tests are in. These tests check to make sure Sorcha can actually read in the ephemeris files it writes. Probably should have had these tests a while ago.

Updated the docs also, plus I edited the default/example config files to make it clearer that the eph_format keyword is used for reading in AND writing out ephemeris files.

samncorn commented 1 month ago

Did a test running ephemeris generation and writing the file, then rerunning while loading in that ephemeris file. Numerically I'm getting a perfect match, but there are a couple of issues

When writing out the full output and reading in the ephemeris, the 'fieldJD_TDB' col is not written.
In all cases (even rerunning ephemeris generation), the order of the rows is not always the same. Not sure if we care too much about that, but it did make testing for differences a bit trickier.

Otherwise, performance looks good. I ran 1000 orbits through 1 year on baseline_v3.3 on a local linux machine. Runtime was ~2 min 30 sec with ephemeris generation, and ~10 sec reading in ephemeris.

mschwamb commented 1 month ago

I think this makes sense as the read in behavior is different than how ephemeris generation outputs stuff- Ephemeris generation takes everything in a chunk and figures out if it is in a pointing where as reading in the ephemeris we need to check that we've got everything in the input ephemeris file so have to read the whole thing multiple times since we can't assume that things are in order. We could do a sort but it adds complexity that doesn't seem needed.

dirac-institute / sorcha

Ephemeris output in selected format #984

Review Checklist for Source Code Changes

Codecov Report