Open clbarnes opened 2 years ago
To address this at Janelia, a companion .keep
file is written after each dat file write completes.
For example:
/cygdrive/d/UploadFlags/0522-09_ZF-Card^E^^Images^Zebrafish^Y2022^M07^D12^Merlin-6257_22-07-12_153254_0-0-1.dat^keep
is written for
/cygdrive/e/Images/Zebrafish/Y2022/M07/D12/Merlin-6257_22-07-12_153254_0-0-1.dat
I'm not sure how/where this is done since I'm just a consumer of this data, but it might be available to you already.
I like the simplicity of your suggested .part
naming scheme - the .keep file names are horrid because they are embedding so much information into the name.
However, a few advantages to the .keep file approach are:
0522-09_ZF-Card
in the example above) that is useful for organizing the data post transfer. This could be pulled from .dat header data instead.I'm not a big fan of the .keep file setup, but I thought it was worth mentioning that it exists and how we currently use it.
Thanks! That is another way of doing it.
A halfway house would be to have the part files kept in a parallel directory hierarchy (under in_progres/
directory or something) and then moved into the complete/
hierarchy. So long as they're on the same file system, this should be just as fast, while keeping the first advantage you listed. There could be an equivalent processed/
hierarchy which satisfies the second advantage. I think the third property is probably best addressed another layer up, if possible.
Mentioned elsewhere but worth its own issue:
It would be really helpful for downstream processing purposes for the in-process writing to be done to some file which is named differently to the final output, and then at the end of the process, rename it. Currently, it's nontrivial to tell whether a file is still being written to or whether it's complete. For the purposes of per-slice post-processing (e.g. converting to a sensible format), it would be nice to regularly run a script which just looks for files of the right name and deals with them.
This should be a relatively small change: the current software should just write to
f"{currentname}.part"
and then dorename(f"{currentname}.part", currentname)
at the end of the process.