MagneticParticleImaging / MDF

Magnetic Particle Imaging Data Format
Other
15 stars 10 forks source link

group /tracer/ should be dataset of compound datatyp #71

Closed Neumann-A closed 7 years ago

Neumann-A commented 7 years ago

From the current look of the group tracer it is not a group but a dataset with a compound datatype and should be stored accordingly. The current layout makes data abstraction unnecessary complicated. (Examples how to create/store compound datatypes -> see HDF5 tutorials)

Suggested abstraction: Define a class/struct Tracer with all Paramters and store a vector of it. Current abstraction: Define vectors of different Parameters and store all of them in one group

tknopp commented 7 years ago

No. Simple reason: Julia does not support compound datasets. We really want to make it as simple as possible to access MDF parameters. Cross language support is absolutely mandatory. Thus we favor only using a subset of features.

Side note. Have a look at ISMRMRD (the MRI data format). It cannot be read from Julia and is extremely complicated to use in python or matlab. I don't think that this is the correct route for an exchange data format.

hofmannmartin commented 7 years ago

I agree on that one. Most of the users will have very little to zero HDF5 experience. Thus we should keep it simple.

Neumann-A commented 7 years ago

No. Simple reason: Julia does not support compound datasets.

Thats seem to be not true. Line 1458 of HDF5.jl states: memtype_id = h5t_create(H5T_COMPOUND, sz); So there seems to be the possibility to create compound types in Julia just a matter of figuring out how it is done. (Also there is always the possibility to directly call c code from Julia using ccall)

Cross language support is absolutely mandatory. Most of the users will have very little to zero HDF5 experience. Thus we should keep it simple.

If you want to keep it simple for the users without knowledge of HDF5 supply them with well defined object wrappers in each language for each information the mdf file can have. Then the user can just use the object wrappers without any knowledge of HDF5 and the mdf file layout at all (which I and certainly others would really appreciate).

Side note. Have a look at ISMRMRD (the MRI data format). It cannot be read from Julia and is extremely complicated to use in python or matlab. I don't think that this is the correct route for an exchange data format.

You mean ISMRMRD? Does not seem too complicated. Just a lot of different data although I cannot argue if the layout is good or not. Also seems easy to use with matlab (at least the read in example seem easy). The main reason why it is complicated to work with: Nobody took the time to write easy to use function/object wrappers for it.

conclusion: maybe supply users with objects wrappers for mdf in each language?

In the end I dont mind how you decided. If you do not change it I will have to work around it. Which in my case just means adding another code path instead of reusing something that already exists.

*edit: updated the link to work correctly.

tknopp commented 7 years ago

Compound datatypes currently do not really work, see https://github.com/JuliaIO/HDF5.jl/issues/426

Hiding complex things behind a language agnostic API is in general a good thing. But there is several issues here:

For this reason we try to keep the file format itself to be accessible.

You seem to be an advanced programmer but please keep in mind that our user base is not necessarily capable of programming C.

Please also note that the request to use compound data types is mostly just a style question. Our current system has the same expressibility.

Are you currently implementing a library for accessing MDF? Is this C or C++? Would be great to get that library into Github so that we have more language wrappers than just MPIFiles.jl

Neumann-A commented 7 years ago

conclusion: Currently I dont have a good Idea how to handle this problem if compound types are out of question but I can work around it. So I will close it.

Are you currently implementing a library for accessing MDF? Is this C or C++?

Its C++ (i dont consider my self a C programmer) and it is not limited to MDF nor HDF5. Its an archive abstraction to serialize any kind of data to any type of archive as long as either the archive or the type knows how to serialize it. Typically the archive itself only supplies serialization support for builtin types and all other types must build their serialization ontop of that, recursivly (Thats the aim, but I already support special types liken Eigen::Matrix and Eigen::Tensor to have a clean Matrix/Tensor definition with strictly sequentiell memory order. Nested STL containers does not supply that). With that I can have a nice interface like: SomeOutputArchiveObject(createNamedValue("NameOfValue",Value)) to save and also load any type independet of how deeply nested it is without concerning myself how the archive actually does it. Of course this needs a very clean mapping between programmatic datatype and datatype representation in the storing archive.

Lets postpone the issue until I had a look at how MATLAB saves arrays of structs of the form: BaseStruct[index].NestedStruct in the V7.3 (HDF5) format. just tried it directly in matlab; stores it with some kind of strange object reference; may retry later in c++ and see what I get there; everything else looked fine in a matfile saved with v7.3 (did not try cell arrays yet) Looks like i will be able to save my c++ mdf layout using my MATLAB archive implementation. It will only add some MATLAB specific attributes to the dataset. If I would have known that I wouldn't even have bothered starting to implement a HDF5 archive

Compound datatypes currently do not really work, see JuliaIO/HDF5.jl#426

Depends. Are you sure you are supplying the correct type to the call? A better implementation is to ask the Dataset for its type (since it is stored in the file) and build the necessary Julia type dependent on what the Dataset tells you. This can be done recursivly to build quite complex and nested types. I have something similar in c++ but since c++ is strongly typed I know the type I expect and can just compare it to the type the dataset gives me (Of course I still need a mapping from programmatic datatype to hdf5 datatype). If it is not equal -> throw an exception

who is developing / maintaining the wrappers? I do so for the Julia bindings
what if someone comes up with a language currently not supported?

First we would need to define a distinct object representation for whats in the mdf: Could be something obvious as: MDF_FileInformation: getCreationTime() etc. MDF_RecieverInformation: getBandwidth(int x) etc. MDF_TracerInformation: getConcentration() etc.. etc. + all methods + returntypes (all types musst be strictly defined!) With that it gets quite easy to support another language because we basically gave the manual how to implement it. [But thats a different topic]

profix898 commented 7 years ago

We really want to make it as simple as possible to access MDF parameters. Cross language support is absolutely mandatory. Thus we favor only using a subset of features.

I second that! Using a compound type for complex (https://github.com/MagneticParticleImaging/MDF/issues/70) could be discussed, but I'm also strongly against replacing groups with compound constructs. We should really keep the format as simple as possible. Unless you have a MDF-binding for your language (as mentioned above), reading/writing groups is much simpler than handling compound types IMO. Also, how would you treat optional fields in a group? In a compound type you are not tolerant, right?

Neumann-A commented 7 years ago

Also, how would you treat optional fields in a group? In a compound type you are not tolerant, right?

Personally i think there shouldn't be any optional fields. Except in an extra group called /optional/ or /detail/

hofmannmartin commented 7 years ago

Personally, I agree. Nonetheless, in practice one often does not know all parameters. Concentration might be unknown. Transfer function is an issue for those, who did not build their own scanner from scratch.

An extra group could have been a solution, but we wanted to keep related data together, since there are optional fields in close to all groups. Hope you can live with this solution.