Closed evanhanders closed 2 years ago
Thanks! Hmm it's a good question about how to name things. Maybe what we should do is add something to the names of the virtual files when we initially create them, like snapshots/snapshots_s1.vh5
or snapshots/snapshots_s1v.h5
or snapshots/snapshots_s1_virtual.h5
. And then the merged file could have the regular name snapshots/snapshots_s1.h5
. I like this because in a world where parallel hdf5 worked perfectly, we would just be directly writing to the snapshots/snapshots_s1.h5
joint file without process files in the first place. Any thoughts @jsoishi?
I agree that we should reserve snapshots/snapshots_s1.h5 for the truly merged file because of the hope that someday we will have a functional parallel hDF5 mechanism, and I think that changing the file extension would be confusing, since .h5 is such an established standard.
What about snapshots/virt_snapshots_s1.h5? I'm afraid that snapshots_s1v.h5 might be too subtle...
On Sat, Apr 2, 2022 at 10:34 AM Keaton J. Burns @.***> wrote:
Thanks! Hmm it's a good question about how to name things. Maybe what we should do is add something to the names of the virtual files when we initially create them, like snapshots/snapshots_s1.vh5 or snapshots/snapshots_s1v.h5 or snapshots/snapshots_s1_virtual.h5. And then the merged file could have the regular name snapshots/snapshots_s1.h5. I like this because in a world where parallel hdf5 worked perfectly, we would just be directly writing to the snapshots/snapshots_s1.h5 joint file without process files in the first place. Any thoughts @jsoishi https://github.com/jsoishi?
— Reply to this email directly, view it on GitHub https://github.com/DedalusProject/dedalus/pull/184#issuecomment-1086653146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7SHYPHY7IQKLT2W2SB47TVDBLFVANCNFSM5Q7HZNRQ . You are receiving this because you were mentioned.Message ID: @.***>
Maybe the merge script should just overwrite the virtual file, using the same name? That way no plotting scripts, etc., downstream have to adjust to the change. I think this makes sense since the point of virtual files is to appear just like merged files, so in theory nothing in the workflow should care if the file is virtual or concrete. This would also avoid having both virtual and concrete merged files side-by-side, which could result in duplicated plots or similar if you're doing analysis like python3 plot_file.py snapshots/*.h5
.
Even better idea. +1
On Mon, Apr 4, 2022 at 10:33 AM Keaton J. Burns @.***> wrote:
Maybe the merge script should just overwrite the virtual file, using the same name? That way no plotting scripts, etc., downstream have to adjust to the change. I think this makes sense since the point of virtual files is to appear just like merged files, so in theory nothing in the workflow should care if the file is virtual or concrete. This would also avoid having both virtual and concrete merged files side-by-side, which could result in duplicated plots or similar if you're doing analysis like python3 plot_file.py snapshots/*.h5.
— Reply to this email directly, view it on GitHub https://github.com/DedalusProject/dedalus/pull/184#issuecomment-1087635177, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7SHYKXJIAHHGADUVUGAMTVDL4R3ANCNFSM5Q7HZNRQ . You are receiving this because you were mentioned.Message ID: @.***>
This was actually the first implementation I tried (just replacing the virtual file with a merged file of the same name), but I was afraid this was a bit risky because it basically sets cleanup=True by default. Is that OK though?
I think there's two (almost identical) ways to make this happen. What's your preference?
I think the first isn't too bad, and it doesn't have to (and probably shouldn't) delete the process files by default, just overwrite or move/delete the virtual merged file in place of the concrete merged file. We can also clean things up to make it easier to recreate the virtual merged file from the process files in post-processing, instead of just online.
It would also be useful to have a merge that works without a virtual file.
Marking as a draft since we're still WIP. I'm free the next few days/week if we want to push more on this.
I'm happy to make a push ~tomorrow and early next week to get this together. Just to check, here's what we want to do, right?
Anything else?
@kburns, @jsoishi, @lecoanet -- This PR has been updated with the requested changes:
Thanks!
BTW: some incoming changes to HDF5 files in d3 on my fork.
On Thu, Apr 28, 2022 at 3:15 PM Evan Anders @.***> wrote:
I'm happy to make a push ~tomorrow and early next week to get this together. Just to check, here's what we want to do, right?
- Clean up the 'merged_' files that this does right now (as we discussed a few weeks ago
- Get a merge that creates the virtual file working
Anything else?
— Reply to this email directly, view it on GitHub https://github.com/DedalusProject/dedalus/pull/184#issuecomment-1112567324, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7SHYN2Z7K3CUTRLDO3VYLVHLPT3ANCNFSM5Q7HZNRQ . You are receiving this because you were mentioned.Message ID: @.***>
This PR fixes #182 by adding two functions to tools/post.py (merge_virtual_analysis(), merge_virtual_file()) and updating merge_setup() to allow virtual files to be merged into single files.
This file also brings merge_setup() from d2 -> d3, merging together the scales (which are now distributed and hashed, unlike in d2).
The workflow of the virtual file merge is:
The workflow of the full merge from partial files is identical to d2 (creates snapshots_s1.h5 from snapshots_s1/snapshots_s1_p*.h5).
@kburns and @jsoishi , it would be great to have both of you review this; I don't think I am allowed to assign reviewers.