Open clbarnes opened 2 years ago
Many filesystems have trouble with operations like listing when a directory contains thousands of files, so I happen to prefer to have some hierarchy in the directory structure.
Unless we anticipate single runs going for many thousands of days, could it just be collapsed into daily directories, then?
We currently have a data set being imaged that will go for months and produces 260+ dat files per hour. So, I've introduced hourly subdirectories for that data set.
That's good to know, thank you! Hard to know what the constraints are when we're clearly working on different scales.
What's the point of storing DAT files under year, month, day subdirectories? It makes no difference to us, especially as we write out the date/time in the file name and accompanying CSV. It just adds an extra level of indirection for the next tool sweeping through the directory, and means that the CSVs have somewhat arbitrary names. The day-wise CSVs duplicate information from the month-wise, the month-wise duplicate information from the year-wise, and there is no single CSV for the entire run (unless your run completes within a single day, a single month, or a single year). Datetime is useful information to have, but there's no need for it to define the structure of the output data.
An exception to this I could see would be if runs are expected to hit inode limits within a single directory, which can be fairly low with certain HPC file systems. In that case, just a single level of indirection (at the day level, ISO-8601 formatted, with a single root CSV) would be preferable.