cortex-lab / Suite2P

Tools for processing 2P recordings
Other
122 stars 66 forks source link

Add storage manager to deal with folder structure #139

Closed fralik closed 6 years ago

fralik commented 6 years ago

Hi! This series of commits bring support of different kinds of folder structure to Suite2P. The intention is to decouple folder structure (file organization) from the code. This is done through an interface BasicStorageManager. It is a class that does several things: it handles db entries, it knows about file paths, it provides options.

Suite2P has a very tight coupling between db and options. I tried not to ruin it completely and options inside storage manage can be viewed as former ops0 variable.

The interface has a number of abstract methods and this is how folder structure is decoupled. Code that interacts with the interface knows that a particular method returns a full file path. It doesn't need to know anything else. The implementation of the interface will know a particular folder structure and will construct the appropriate file path. I include one implementation, called CortexLabStorageManager, which can be used as default and it implements folder structure as follows:

\RootStorage\mouse_name\session\block*.tif(f)

Note, however, that the interface has a different way of providing information about files. There is a db entry, which describes experiments that you would like to be processed. Each db entry may consist of one or several partitions. In former Suite2P code, partition corresponds to sub folders (typically different experiments). Ideally, code that uses the interface should not care about partitions and may get all files from all partitions sequentially. This is similar to how Matlab defines Datastore, in particular FileDatastore. Note that it is still possible to obtain information about partitions.

Whenever some code needs a file path/folder, it should ask storage manager for it. For example, a call to fileName = sm.getFileForPlane('f', 1) will return a full file path for f results file that corresponds to the first plane. After that, the code may proceed with working with the file name, e.g. fid = fopen(fileName, 'w');.

Attached is an UML diagram of the interface, where symbols before each member name are

+         Public
-         Private
#         Protected

and other properties are:

italic     abstract

You may see that the interface implements basic operations with db entries within itself.

Important! With new storage manager, variables for file paths and folders have moved from options to db entries. Moreover, if desired, every db entry may have a different rootStorage, RegFileBinLocation, e.t.c.

An implementation class CortexLabStorageManager is provided as well. I tried to write this class so that it is backwards compatible with former db and ops0 variables. There is also a static method convertDbToStorageManager, which converts former db to new representation.

I've also added unit tests for CortexLabStorageManager. They are found in file CortexLabStorageManagerTest.m in folder tests. By default, test suite automatically downloads 5 tiff files from a shared dataset (provided by Suite2P and stored on Google Drive). One can, however, download files manually and point the test suite to it either through an environment variable or by modifying the code of CortexLabStorageManagerTest directly.

Limitations

I focused on a standard non-GUI pipeline only. This means that there files that use storage manager and there are still a lot of files inside Suite2P, which do not. It is hard for me to judge if all files should be modified. From my perspective there is a lot of code which seams to be dead. I call a function or a file dead if there are no references to that function/file from any other file in the toolbox.

The part of the pipeline that deals with red channel (i.e. run_REDaddon_sourcery) is not yet implemented to work with storage manager. I shall get some test files next week and fix this.

Final words

I hope that you will find this patch useful. Feel free to comment or ask any questions. We would really love to see this merged in the main code base as we use different folder structure and every time there is a new commit upstream we have to go through a tedious process of manual merging.

marius10p commented 6 years ago

Hi Vadim,

This pull request is rejected. Please maintain your code in your own branch and do not attempt to push more code to the master branch, which we maintain. As you pointed out, the pattern of dependencies is complex and we change it periodically with updates. We cannot risk breaking some of the functionality of Suite2p to make it easier for your lab to change the options. In your case, it looks like this pull request would break all but the core functionality.

Main reason to reject this particular request is that there are a lot of people already using the current interface and this change would break their code. The other reason is that you are breaking a lot of our code, including crucial GUI functionality, red channel code, z-stack code, etc as well as offline code that we use to do post-processing and analysis. The final reason is that nothing was broken. The current system for specifying options and paths works as intended and is accessible to a wide range of people with minimal Matlab experience.

You should have asked first, as there are multiple ways to specify data paths for Suite2p. Reading the readme file would have directed you to the make_db_example script, which shows how to set the folder with the tiffs manually:

https://github.com/cortex-lab/Suite2P/blob/master/configFiles/make_db_example.m

Sorry for your troubles in making this pull request, and I hope it will be useful to you in your private branch, from which you can pull regularly from Suite2p if you want to get our latest updates. You can also open an issue if there is a particular data structure you would prefer to see implemented, though we don't guarantee a speedy implementation.

fralik commented 6 years ago

Hi Marius,

I wonder if you ever have thought about the concept. I can fix the GUI part and red channels and offline tools as well. No problem here. This is why I made the merge request now, before I fix everything, so that we can discuss and add more commits if needed.

Of course I read the readme, I studied examples and worked through the code. The point is that there is no flexibility in code like this:

        ops1{i,j}.RegFile = fullfile(ops.RegFileRoot, ...
            sprintf('%s_%s_%s_plane%d.bin', ops.mouse_name, ops.date, ...
            ops.CharSubDirs, i + (j-1)*numPlanes));

The only flexibility, which the example file provides, is to use a RootStorage variable in db entry. It does't change the way the code above constructs a file path. And besides, with code as above you just break DRY principle. It is sad if you do not see it.

I understand the situation with established users. This is why I provided a default implementation and a function that allows to use storage manager through current db and ops0.