glotaran / pyglotaran

A Python library for Global and Target Analysis of time-resolved spectroscopy data
GNU Lesser General Public License v3.0
53 stars 18 forks source link

Agree on a default storage format #145

Closed s-weigand closed 5 years ago

s-weigand commented 6 years ago

Description

This is a general designe issue As issues for the support of different common file formats come up (#143 ), I think we are at a point where we should agree on an default storage format. Ofc legacy support for formats used in GTA1.5 should be provided aswell.
IMHO this shouldn't be .csv since it needs extra mappings for highter dimentional data (see #142 ). Also making a custom mapping to .csv would be reinventing the wheel and be part of the "how to parse this format?" problem.

I think that hdf5 is a good choice since:

For some more information on hdf5 you can watch those talks: HDF5 is Eating the World | SciPy 2015 Store and manage data effortlessly with HDF5 | PyData Amsterdam 2016

SerLap commented 6 years ago

Well we have been discussing the issue about HDF5 already for some time, there is even some code written for hdf5 support in gta1.5. This can be an option, and will allow us to be flexible enough for future development and addition of new filefomats and new data types. The only reason its not yet there ist a man power to implement it. There is also a difference between our used inside the program (at the moment its a serialised Java class, and the other is what formats can be used to load the datat to the program, if we want to be user friendly there should be a large number of supported files, if you ask people first to convert their data to something different it will discourage some people to use it. This is the reason why we have SDT (output of BH counting card), IMG (output of hamamatsu streak camera) PT.. (output of picoquant cards) and different types of ASCII those one has with storage format, storage format is somehting else, and this can be and and some point we agreed with Joris about HDF5, and hdf5 can also be default export file format, but already at the moment we have indof common storage format. but again we should not restrict ourself to only one input file. Also you would not be able to get rid ofcsv or ascii files because defacto that is most used fileformat to transfer data between different programs used by people.

joernweissenborn commented 6 years ago

@SerLap Thanks for your comment

HDF5 will not be the default output format for Glotaran. There are mainly 2 reasons. First of all, I dislike HDF5. Second, I don't see any reason to have a 'default' format.

Why should we need a 'default' format? If a format can be read by GTA, it can be written by GTA. Aside from this, there is already a PR pending, which basically gives us methods ala Dataset.from_pandas and the reverse Dataset.to_pandas. The rest can be done by pandas. I see no reason why GTA should care more about data then its fullfilling a simple interface.

About my dislike about HDF5: You have to understand, that HDF5 is basically FAT32 augemented with symlinks. It is not a file, it is a filesystem with files in one big file. Filesystems are much more efficient at these things. Folder structure plus binaries accompanied by metafiles, thats for the win. If you want to get rid about ascii, look at msgpack e.g.. Just because ASCII has it's limits, it doesn't means that that you need to replace it with a filesystem. The people writing ntfs and ext4 have much smarter solutions.

A last thing maybe: don't confuse the python glotaran with 1.5. The glotaran, as joris created is UI for TIMP. The new glotaran I created, is python library wich gives you all the power of TIM, but no GUI other then matplotlib and ipython.

I know this comment is incomplete, but see it as a discussion opener.

joernweissenborn commented 5 years ago

Ok, since there no new contributions on the discussion, I will close the issue so we can focus on the pressing issues.