cortex-lab / Suite2P

Tools for processing 2P recordings
Other
122 stars 66 forks source link

Registration results file paths hell #132

Closed fralik closed 6 years ago

fralik commented 6 years ago

Hi!

I was wondering if there are any plans on straightening file paths for registration output? What we have now:

  1. RegFileRoot - location for binary file.
  2. ResultsSavePath - some results go here, seems not to be registration related.
  3. RegFileTiffLocation - if provided, saves registration results as tiff files in that folder. The sole purpose of this option seems to specify whether to save registration results as tiff files or not.
  4. RegFileBinLocation - if provided, copies bin file with registration results to specified location, but not only this.

Looks kind of OK, but the usage is cryptic. Here is how these variables are used. There are several files created:

  1. regops file. Saved under ResultsSavePath. Why not in RegFileRoot?!
  2. plane binary files. Binary (.bin) files created for each processed plane, saved in RegFileRoot.
  3. Related to previous file type, interpolated plane binary. Binary (.bin) files created for each processed plane if interpolation across planes is used. Saved in a folder that starts with RegFileBinLocation. Why not in RegFileRoot?!
  4. F file. Saved under ResultsSavePath.
  5. plane tiff. Tiff files for each processed plane. Saved in a folder that starts with RegFileTiffLocation.

From this list of 5 file types it seems like 4 are registration related. However, all of them are saved in different folders! Here is an overview in table form.

  RegFileRoot ResultsSavePath RegFileTiffLocation RegFileBinLocation
regops x  
plane binary x
interpolated x
plane tiff x

My suggestion: make it simple and clean. Basically, throw away all the options but one - ResultsSavePath. If you think that registration results must be placed in a different user-specified folder, then have one additional option RegistrationSavePath. Otherwise, just put it under ResultsSavePath, i.e. ResultsSavePath/registration. RegFileTiffLocation option is in realty a boolean saveRegistrationTiff option, i.e. it is sole purpose is to give user an option whether to save results as tiff(s) or not.

It is also worth mentioning that the usage of RegFileRoot contradicts README file where it says All of these filepaths are completed with separate subfolders per animal and experiment. RegFileRoot is not completed with subfolders.

I can make a pull request for it if you think this sounds OK. This is however a breaking change. I do not know how you handle Suite2P versions and what your release plans are..

I am abstracting all these things away in my work I mentioned in #131, but it will be much better if we could clear things up as a first step.

carsen-stringer commented 6 years ago

Please read the information in the master_file_example.m about these file paths. The RegFileRoot is supposed to be an SSD (which doesn't have unlimited capacity). The code will be much slower if you use your normal hard drives. Also, at least in our case, and potentially many others, the data is located on a server, not locally, so people may want to save registered tiffs there (RegFileTiffLocation). We don't use the registered tiff functionality, but others do. The RegFileBinLocation is somewhere (not your SSD, because your SSD has limited capacity, a binary file can be around 100gb) where you can copy the binary if you want to keep it for further processing.

You can do whatever you want with the pipeline, but your speed will suffer if you choose to do all the processing on a hard drive. I'll fix the readme RegFileRoot description.

fralik commented 6 years ago

I see. I read the master file example of course. Perhaps the naming of options is cryptic. My point was that it is very unintuitive to figure out what files will be saved where by option names and comments.

I see these things from your description:

  1. Raw data is/can be stored remotely.
  2. Some intermediate results should be stored on SSD in order to improve speed (SSD can be remote as well, but let's assume it is local. :)
  3. Some data should be transferred to remote location after processing is done on SSD because SSD are typically small in capacity.

You can make this happen like this:

I end up with the same number of parameters - 4, but only 2 of them are paths. The other two can be boolean. Also, having this FastStoragePath, you can actually store everything you want (for speed) there before you transfer it to a different location.

Here is how the usage table will look like after the pipeline is finished:

  ResultsSavePath FastStoragePath SaveRegistrationTiff SaveIntermediateResults
regops x  
plane binary x true
interpolated x true
plane binary x false
interpolated x false
plane tiff x true

Fast storage will be used during processing regardless of SaveIntermediateResults value. It is only that the results are copied/moved to a bigger storage if SaveIntermediateResults is true.

My naming is perhaps not the best and different words should be chosen. With paths from the master file example it will be like this:

ResultsSavePath = 'D:/DATA/F';
FastStoragePath = 'C:/DATA/';
SaveRegistrationTiff = false;
SaveIntermediateResults = false;