boutproject / xBOUT

Collects BOUT++ data from parallelized simulations into xarray.
https://xbout.readthedocs.io/en/latest/
Apache License 2.0
21 stars 10 forks source link

Use run_id to sort files #250

Closed dschwoerer closed 9 months ago

dschwoerer commented 2 years ago

Also provide better error in some cases.

codecov-commenter commented 2 years ago

Codecov Report

Merging #250 (5c82b92) into master (afb5ba6) will increase coverage by 0.00%. The diff coverage is 67.44%.

@@           Coverage Diff           @@
##           master     #250   +/-   ##
=======================================
  Coverage   68.99%   68.99%           
=======================================
  Files          15       15           
  Lines        3209     3248   +39     
  Branches      780      794   +14     
=======================================
+ Hits         2214     2241   +27     
- Misses        733      743   +10     
- Partials      262      264    +2     
Impacted Files Coverage Δ
xbout/load.py 69.40% <67.44%> (-0.02%) :arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

dschwoerer commented 1 year ago

@dschwoerer what is the use case motivating this change?

I was certain I had replied already. Sorry for not doing so.

The motivation is to allow something like BOUT.dmp.*.nc with having old, stale dump files.

I think the feature is nice, but the case I'd think of is where the run directories are not named in the expected way (e.g. run0/, run1/, etc.). For that this PR doesn't help (yet!). It should be possible to get run_restart_from as well as run_id and then (assuming that the runs are a set of consecutive restarts, which presumably they usually will be if they are being concatenated in the time dimension) sort run_ids, run_restart_from_list and paths_sorted so that run_restart_from_list[i] == run_ids[i-1].

That would also be nice but it is out of scope for this PR

Also it would be good to test the new functionality in test_load. Not sure how complicated that would be though, I guess you would need to add some functionality to generate some run_id values.

I didn't get the time to figure out how the tests work. And the tests being horribly slow does not help :-(

johnomotani commented 1 year ago

The motivation is to allow something like BOUT.dmp.*.nc with having old, stale dump files.

I'm not sure I'm quite following. This is already a nice improvement - that currently gives much nicer error messages, and could maybe be expanded to add more features - but it looks like this still opens all the files that are matched by the glob pattern. If I understand your comment, you want to handle something like running a simulation on 8 procs (the old stale simulation), then running a new one on 4 procs (the simulation we now want to open), and open with just BOUT.dmp.*.nc as the pattern. I agree this would be nice, but it seems like in this example e.g. BOUT.dmp.7.nc would still be added to paths_grid to be opened, although that is a 'stale file' - am I missing something?

dschwoerer commented 10 months ago

I would have liked to add tests, but I doubt to get to that anytime soon :-(