Flatten directory hierarchy

clbarnes commented 2 years ago

What's the point of storing DAT files under year, month, day subdirectories? It makes no difference to us, especially as we write out the date/time in the file name and accompanying CSV. It just adds an extra level of indirection for the next tool sweeping through the directory, and means that the CSVs have somewhat arbitrary names. The day-wise CSVs duplicate information from the month-wise, the month-wise duplicate information from the year-wise, and there is no single CSV for the entire run (unless your run completes within a single day, a single month, or a single year). Datetime is useful information to have, but there's no need for it to define the structure of the output data.

An exception to this I could see would be if runs are expected to hit inode limits within a single directory, which can be fairly low with certain HPC file systems. In that case, just a single level of indirection (at the day level, ISO-8601 formatted, with a single root CSV) would be preferable.

trautmane commented 2 years ago

Many filesystems have trouble with operations like listing when a directory contains thousands of files, so I happen to prefer to have some hierarchy in the directory structure.

clbarnes commented 2 years ago

Unless we anticipate single runs going for many thousands of days, could it just be collapsed into daily directories, then?

trautmane commented 2 years ago

We currently have a data set being imaged that will go for months and produces 260+ dat files per hour. So, I've introduced hourly subdirectories for that data set.

clbarnes commented 2 years ago

That's good to know, thank you! Hard to know what the constraints are when we're clearly working on different scales.

JaneliaSciComp / jeiss_fibsem_labview_control

Flatten directory hierarchy #10