JaneliaSciComp / jeiss_fibsem_labview_control

BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Faux-atomic write #8

Open clbarnes opened 2 years ago

clbarnes commented 2 years ago

Mentioned elsewhere but worth its own issue:

It would be really helpful for downstream processing purposes for the in-process writing to be done to some file which is named differently to the final output, and then at the end of the process, rename it. Currently, it's nontrivial to tell whether a file is still being written to or whether it's complete. For the purposes of per-slice post-processing (e.g. converting to a sensible format), it would be nice to regularly run a script which just looks for files of the right name and deals with them.

This should be a relatively small change: the current software should just write to f"{currentname}.part" and then do rename(f"{currentname}.part", currentname)at the end of the process.

trautmane commented 2 years ago

To address this at Janelia, a companion .keep file is written after each dat file write completes.

For example:

/cygdrive/d/UploadFlags/0522-09_ZF-Card^E^^Images^Zebrafish^Y2022^M07^D12^Merlin-6257_22-07-12_153254_0-0-1.dat^keep

is written for

/cygdrive/e/Images/Zebrafish/Y2022/M07/D12/Merlin-6257_22-07-12_153254_0-0-1.dat

I'm not sure how/where this is done since I'm just a consumer of this data, but it might be available to you already. I like the simplicity of your suggested .part naming scheme - the .keep file names are horrid because they are embedding so much information into the name.

However, a few advantages to the .keep file approach are:

I'm not a big fan of the .keep file setup, but I thought it was worth mentioning that it exists and how we currently use it.

clbarnes commented 2 years ago

Thanks! That is another way of doing it.

A halfway house would be to have the part files kept in a parallel directory hierarchy (under in_progres/ directory or something) and then moved into the complete/ hierarchy. So long as they're on the same file system, this should be just as fast, while keeping the first advantage you listed. There could be an equivalent processed/ hierarchy which satisfies the second advantage. I think the third property is probably best addressed another layer up, if possible.