Closed mabruzzo closed 1 day ago
This all seems fine to me, but I'd like @helenarichie or @bcaddy to take a look, since I think they are the main ones who have been using the scripts in their existing form.
I think this looks good! I like having the default behavior be that you don't have to specify the number of processes. It might be nice to also do that with the snapshot numbers at some point. I also don't have any issues with changing the way the snapshot numbers are specified.
I am having a little bit of trouble figuring out which arguments are required/what the default behavior is and was wondering if you could clarify. For example, if I run this and my input directory contains 2D and 3D snapshots will it just go ahead and concatenate all of them unless I tell it to omit certain types of output? Sorry if I'm missing something obvious!
It would also probably be good to update the documentation with these changes.
I think this looks good! I like having the default behavior be that you don't have to specify the number of processes.
And it is less error-prone!
It might be nice to also do that with the snapshot numbers at some point.
I think that could definitely be nice! One of my goals here is achieving more sensible default behavior with fewer arguments (along those lines, it could be cool to do something along those lines with the -s
or -o
flag)
My next primary concern is introducing some level of parallelism.
I am having a little bit of trouble figuring out which arguments are required/what the default behavior is and was wondering if you could clarify. For example, if I run this and my input directory contains 2D and 3D snapshots will it just go ahead and concatenate all of them unless I tell it to omit certain types of output? Sorry if I'm missing something obvious!
The behavior of concat_3d_data.py
and concat_2d_data.py
are unchanged in this regard.
For concat.py
you need to opt in for each type of output with the --kind
parameter. Consequently,
--kind 3D
just does 3D data--kind slice
just does slices (--kind slice-xy,xz
will omit data from the yz
datasets).--kind proj
just does axis-aligned projections--kind rot_proj
just does rotated projections--kind 3D slice proj rot_proj
would do all of the aboveOnce we add support for concatenating particles, I might support --all-kinds
to handle all types of supported datasets)
It would also probably be good to update the documentation with these changes.
I'm happy to do that. I might hold off until I have your approval/this is merged.
This is great. I'm totally for it
Ah, I see. I had gotten myself a little confused before... I appreciate the clarification!
I think this is good to go!
I'll go ahead and merge this, @mabruzzo feel free to update the wiki when you get a chance.
This introduces a few changes to the concatenation scripts. These are all surface-level changes (they don't affect the output format -- that is still something I want to address).
There are essentially 3 changes:
concat.py
that can be used to concatenate more than one kind of dataset at a time (3D-cubes or multiple kinds of 2D datasets).--concat-outputs
with--snaps
(this is a subjective improvement)--snaps
is of the formSTART:STOP[:STEP]
(just like a python slice). For comparison, you use would specify a range withFIRST-LAST
with--concat-outputs
(LAST
is included in the range)--concat-outputs
, but you get a deprecation warning--num-processes
, but you get a warning about it.I also haven't really touched the particle-concatentation script. Based on how code is shared, change number 2 will affect that script. But propagating change 3 would take some additional work.
I'm definitely open to any kind of feedback!
In a subsequent PR, it's my intention to modify things so that we can optionally use multiprocessing or mpi4py to speed up concatenation (I have an idea on how to do that in a minimally invasive manner that won't affect hypothetical dask-usage).