JaneliaSciComp / jeiss_fibsem_labview_control

BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Metadata in the conversion of DAT file to HDF5 file #2

Open mkitti opened 2 years ago

mkitti commented 2 years ago

Dear @acardona and @histonemark,

@d-v-b currently has code to ingest and export the DAT files to HDF5 in Python here: https://github.com/janelia-cosem/fibsem-tools/blob/master/src/fibsem_tools/io/fibsem.py https://github.com/janelia-cosem/fibsem-tools/blob/master/src/fibsem_tools/io/h5.py

See also https://github.com/janelia-cosem/fibsem-tools/pull/24 .

I also wrote some code for an early demo here which may be more concrete at the moment: https://github.com/mkitti/fibsem-tools/blob/fibsem_h5/src/fibsem_tools/io/fibsem_h5.py

Earlier you had expressed and interest in text readable metadata. That could be exported via the hdf5-json package: https://hdf5-json.readthedocs.io/en/latest/

We will likely include the former 1024 byte DAT headers as an attribute of the HDF5 file. A non-mutually exclusive alternative would be to put them into a 1 KB HDF5 userblock so that legacy DAT reader code could ingest the old header.

We could send you a sample HDF5 file. Do you have any preferences with regard to metadata processing?

-Mark

histonemark commented 2 years ago

Hi Mark, Thanks! In principle we have data so we can test your tool directly here. Regarding the header data storage, either of the options you list are valid in my opinion, its something you wanna be able to read in case you need it but you rarely do, so I don't have a strong preference. Perhaps @acardona has? Just to add to the pile, Chris Barnes from our lab also did an implementation in python of a fibsem.dat reader, adding here the link in case there is something of interest: jfibsem_dat repo

mkitti commented 2 years ago

I've added Chris Barnes' jfibsem dat reader to the list of reader implementations in the README

mkitti commented 2 years ago

Looking at all these implementations makes me realize that we need to work on the canonical terms for all the attributes.

clbarnes commented 2 years ago

Yes, I did reorganise/ rename them in mine in order to make it more pythonic. I was (and generally am) very set on using member variables to represent the metadata items rather than accessing it like a dict with arbitrary/ runtime-assigned keys; much easier to document, discover, reason about, and test against. As I only had access to the MATLAB implementation I had no way of telling whether the variable names there were canonical anyway.

clbarnes commented 2 years ago

If names are to be canonicalised, could I request that they be more explicit than those currently in use? Saving a few characters when writing the first implementation is meaningless compared to the number of times the implementation is read, and any further writing should autocomplete in any sane setup.

clbarnes commented 2 years ago

Just to complete the loop mentioned in the discussions, my implementation of a script to do this is here https://github.com/clbarnes/jeiss-convert/

mkitti commented 2 years ago

Thank you, Chris.