Ulm-IQO / qudi

(Legacy project) A modular laboratory experiment management suite, predecessor to qudi-core.
GNU General Public License v3.0
166 stars 141 forks source link

Allow data to be saved in subfolders #652

Closed timoML closed 3 years ago

timoML commented 3 years ago

Currently, calling save_data() to a DataStorage object will always save to a constant directory that is fixed during construction of the DataStorage object. This PR introduces a lightweight possibility to specify a sub directory. It might come in handy when an external script is running the same measurement on different POI targets (see screenshot).

Description

To be minimal code invasive, the nametag is checked for a possibly containing path. Eg. instead of using a 'pulsed_measurement' nametag, one could specify 'nv0/pulsed_measurement'. Although it overloads the nametag with additional functionality, I find this more elegant than introducing new parameters in all methods of the save toolchain. Importantly, external scripts can still use the save_measurement_data() methods of any module and would just need to alter the nametag provided.

How Has This Been Tested?

Tested on dummy and saving pulsed measurement data. As saving is not implemented for most modules yet, it's currently hard to expand testing.

Screenshots (only if appropriate, delete if not):

image

Types of changes

Checklist:

timoML commented 3 years ago

I allowed myself to add a fix for the NpyDataStorage. Please have a look whether this is the intended behavior. kwargs['comments'] is definetly not implemented in DataStorageBase.

Neverhorst commented 3 years ago

save_data and load_data are supposed to have tons of keywords, so I don't see a problem in creating a new one if it adds functionality. Especially because they are all keyword-only and optional (no incompatible call signatures).
Please do not introduce ambiguity by overloading arguments with non-obvious functionality.

However in this particular case it is not necessary. All config parameters in the storage class __init__ can be manipulated after instantiation by their respective attributes. In this case you can do something like this:

# Create storage instance
my_data_store = TextDataStorage(sub_directory='nv0')
# Save data to <qudi_data_dir>/nv0/
my_data_store.save_data(data)
# Change subdirectory
my_data_store.sub_directory = os.path.join('another_dir', 'nv1') 
# Save data to <qudi_data_dir>/another_dir/nv1/
my_data_store.save_data(data)

An even better way would be to simply create another instance of the storage object. These objects __init__ is lightweight enough to be created and destroyed on demand.
You do not need to create this object e.g. inside on_activate. It's ok to create it only when saving data.

timoML commented 3 years ago

I'm aware of the possibility to simply create a new DataStorage. :) The issue with this solution is the particular use case of some external script interacting with the save method of a module: Eg. pulsed_measurement_logic does quite a lot of logic for its saving. If I don't want to replicate the respective code in the external script, there is currently no easy way to specify a sub-directory and make use of the existing module code. One can of course argue that this is the responsiblity of the module (not the core). Then however, all our modules would need an additional, preferebly consistent, keyword argument to create sub dirs.

I see that overloading the nametag is not a perfect solution either. Here, I find the "folder/tag" a quite natural extension, though. Especially, since slashes are not permitted in file names and cause raising in the current code anyway.

alrik-durand commented 3 years ago

I don't have any comment on this feature itself but I would like you mention that additional parameters to save logic can be used to easily sort data by POI or any other parameters (multiple parameters if you like !) Thus removing the need to create specific folders. We do this in our lab every day and it has the advantage of no breaking the usual file structure :)

Neverhorst commented 3 years ago

I'm aware of the possibility to simply create a new DataStorage. :) The issue with this solution is the particular use case of some external script interacting with the save method of a module: Eg. pulsed_measurement_logic does quite a lot of logic for its saving. If I don't want to replicate the respective code in the external script, there is currently no easy way to specify a sub-directory and make use of the existing module code. One can of course argue that this is the responsiblity of the module (not the core). Then however, all our modules would need an additional, preferebly consistent, keyword argument to create sub dirs.

I see that overloading the nametag is not a perfect solution either. Here, I find the "folder/tag" a quite natural extension, though. Especially, since slashes are not permitted in file names and cause raising in the current code anyway.

I'm not sure I fully understand what you want to accomplish, @timoML

Do you want to alter the save method of a logic module from e.g. a jupyter notebook? If so then I'm strongly against making this an issue with the general utility storage class. This is very bad practice and personal preference. If you want your notebook to override measurement module behaviour it's solely the problem/responsibility of your notebook. It is not within the scope of logic development to comply with custom notebook. Notebooks can and will brake without notice between qudi releases. If you want code to be properly maintained and checked for breaking changes, you need to add it to the qudi repo. Notebooks/Skripts are just intermediate solutions.

A simple workaround that comes to mind: The notebook could move the created files after they have been saved by the logic method to a new folder.

If you want to customize saving behaviour of single logic modules in a "runtime-fixed" way, you could also add optional ConfigOption to the logic module. They are responsible for proper formatting and saving of their data (you never know what measurements people want to implement and what is the best way to store them)

But yeah, I tend to agree with @alrik-durand. You should consider encoding measurement metadata in... well... the metadata. If you want fast sorting and lookup of data, you can use the new DataStorageBase class to implement database storage (e.g. SQL, HDF5, Pandas...).

Neverhorst commented 3 years ago

Possibly another approach would be to alter the save method signature of logic modules to accept an optional directory and filename argument. This is something we wanted to implement anyways since... a long time. The reason for that is that it enables a GUI to save data by using a file selection dialog like you are used to from other applications when using "save as...". This would automatically also make it possible for scripts to fully customize the save path when calling the logic save method.