Closed evanhanders closed 2 years ago
Yeah it would definitely be nice to have some flavor of this, it's just a shame that it (and the current FileHandler in geneeral) requires so much boilerplate -- as far as I've seen, there's no simpler copy mechanism that can merge together virtual datasets. I hate merging in general so it would be really nice to get parallel HDF5 or another parallel-write system like zarr working, maybe as separate FileHandler subclasses.
I agree, it would be great to have parallel file-writes (and if I understand what has been the issues with getting that working, I would probably be willing to lead the effort on implementing that over the next few months).
But in the short-term, I guess I'm wondering if I should clean this up a bit for integration into d3? I think a bit of boilerplate can be removed / streamlined. Otherwise, I can just post this .py script as a tool to the user group for people who are interested? Just trying to figure out the right way to move forward / recover this functionality for people who need a tool like this.
Yeah it would be great to have a PR for adding this, thanks!
Sounds good! I'll work on getting a preliminary PR together this week or so.
Virtual files (and the partial files they read) are starting to cause me issues with my filesystem quotas on Pleiades. I've created a little script based on d2's merging logic that merges a virtual file into a single, merged dataset (attached here, in .txt format: merge_virtual_files.py.txt).
If there's interest, I can improve this and work towards a pull request to put this into d3's tools/post.py file.