file format peeves - Githubissues

nschloe commented 5 years ago

Developing meshio, I noticed some deficiencies in the MDPA file format

https://github.com/KratosMultiphysics/Kratos/wiki/Input-data

which make it unsuitable for efficient consumption.

In order, here they are:

The blocks don't specify the number of nodes/elements/... up front. This way, every reader has to go through the file line by line and see if the line is End Something. Much more efficient would be to read the entire data set en bloc, but that's only possible if the number of items is given up front, e.g., Begin Nodes 2415.
- Element/Node IDs and data are mixed, line by line. This makes it necessary to have a separate read for every line, singling out the first component. It would be more efficient to read a certain number of floats at once, then a certain number of ints. This could be achieved by separating the IDs and the data.

No binary data. All data is given as ASCII. I'm not sure why this decision has been made, perhaps to make debugging easier? In my experience, this has never helped too much. Binary data has the advantage of being true to machine representation, so you don't have to cough up a 16th decimal which is only half true. Also, reading is much faster if you can tell the computer to simply read the following n bytes and interpret them as m float32, for example.

If you want to make the format better, I'd suggest considering some of the above options.

EVEN BETTER of course would be to ditch this custom format and use one of the million existing ones. Yes, yes, I get it, they don't exactly fit your use case.

standards

Perhaps XDMF is for you. It's well-adopted, well-written, and has an <Information> field which can be used for just about anything.

loumalouomega commented 5 years ago

I will tag to @pooyan-dadvand @RiccardoRossi and @KratosMultiphysics/technical-committee in general. I include @KratosMultiphysics/implementation-committee too

loumalouomega commented 5 years ago

Developing meshio, I noticed some deficiencies in the MDPA file format

https://github.com/KratosMultiphysics/Kratos/wiki/Input-data

which make it unsuitable for efficient consumption.

In order, here they are:

* The blocks don't specify the number of nodes/elements/... up front. This way, every reader has to go through the file line by line and see if the line is `End Something`. Much more efficient would be to read the entire data set en bloc, but that's only possible if the number of items is given up front, e.g., `Begin Nodes 2415`.

* Element/Node IDs and data are mixed, line by line. This makes it necessary to have a separate read for every line, singling out the first component. It would be more efficient to read a certain number of `float`s at once, then a certain number of `int`s. This could be achieved by separating the IDs and the data.

* No binary data. All data is given as ASCII. I'm not sure why this decision has been made, perhaps to make debugging easier? In my experience, this has never helped too much. Binary data has the advantage of being true to machine representation, so you don't have to cough up a 16th decimal which is only half true. Also, reading is much faster if you can tell the computer to simply read the following n bytes and interpret them as m `float32`, for example.

If you want to make the format better, I'd suggest considering some of the above options.

Thanks for your suggestions.

BTW I still have pending to update the MDPA interface on the meshio

philbucher commented 5 years ago

some thoughts wrong my side:

I think we should at least this time not reinvent the wheel ( even though this is what we like to do ;) ).
We already have hdf5 support for the ModelPart! This is working for a while already but we don't use it for this purpose. I don't think there is currently a better format available.

loumalouomega commented 5 years ago

some thoughts wrong my side:

* I think we should at least this time not reinvent the wheel ( even though this is what we like to do ;) ).

* We already have hdf5 support for the ModelPart! This is working for a while already but we don't use it for this purpose. I don't think there is currently a better format available.

In any case we can add some minor improvements ensuring always retrocompatibility, and we can add them by stages

philbucher commented 5 years ago

In any case we can add some minor improvements ensuring always retrocompatibility, and we can add them by stages

If we find someone that is willing to invest time into this :D

RiccardoRossi commented 5 years ago

as a short feedback, when we designed the format we deliberately avoided to add the number of entries at the beginning of the block, to simplify the writing. the .mdpa was a simplification of the inpit file we had before which had a parsing based on boost spirit.

i agree with Philipp that rather than further extending mdpa, we should concentrate on a more standard input model.

i am not 100% sure hdf5 is the way to go, since for example i wouldn't know how to write it from gid, and as in windows it is a less than pleasant ride to install it.

i would also consider json (or binary variants) since it removes the parsing burden from us, however this comes at extra memory cost, which may be unadmissible

i am also worried about mpi reading since we need to ensure that no single processor has the entire file in memory or parsed to a local database.

in short, i don t have a real alternative for now...but proposals would be welcome

On Thu, Aug 8, 2019, 9:18 PM Vicente Mataix Ferrándiz < notifications@github.com> wrote:

some thoughts wrong my side:

I think we should at least this time not reinvent the wheel ( even though this is what we like to do ;) ).

We already have hdf5 support for the ModelPart! This is working for a while already but we don't use it for this purpose. I don't think there is currently a better format available.

In any case we can add some minor improvements ensuring always retrocompatibility, and we can add them by stages

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/KratosMultiphysics/Kratos/issues/5365?email_source=notifications&email_token=AB5PWEIVDP6HYFHNQXKCT4DQDRWSDA5CNFSM4IKBEKS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34ULRI#issuecomment-519652805, or mute the thread https://github.com/notifications/unsubscribe-auth/AB5PWEP7FMQ7BPAYZLBZQ5DQDRWSDANCNFSM4IKBEKSQ .

philbucher commented 5 years ago

I agree that we cannot always count on hdf5 since this has additional dependencies. We could use it as the high-end solution for large models (where its benefits pay of)

As a native solution we could use json, since this seems to me like the logical step since we use it extensively in Kratos already It might not be perfect in terms of speed/memory, but if it becomes important, we would go for hdf5

And ofc we will support mdpa for a long long time for backwards compatibility

RiccardoRossi commented 5 years ago

@philbucher reharding hdf5 how wpuld u actually create the inpit? from which preprocessor?

philbucher commented 5 years ago

@RiccardoRossi I am ofc not the expert here, but I assume that the preprocessor would need the hdf libs too. Then it should be the same procedure I would say: Open the file and write to it.

Maybe the writing can even be generalized, i.e. one only gives an object that manages the access to the file (mdpa, json, hdf) itself to the modelpart-writing-routine.

@msandre you have more experience here, what do you think?

RiccardoRossi commented 5 years ago

just for curiosity, here was my experiment with json

Kratos/kratos/includes/json_io.h

it used to work, however it was based on rapidjson so i am not convinced it will still work today. anyhow it was just a first attempt (completely unilateral too) so i have no intention to defend it in any way.

regarding XDMF (following @nschloe suggestion) we should take a look. My problem howevr is that it is XML based and so far we tried not to make use of XML. (i personally much prefer json format over xml)

philbucher commented 5 years ago

Yeah I know abt the json-trial

Regarding xdmf imI have the same opinion, I also prefer json over xdmf. Also I am not sure how usable xdmf is for large models. We could probably do some binary &|| compression, but then we (aka the preprocessor) again are depending on some library => this would be the same as for hdf5

If the preprocessor supports python one can use h5py (pip installable)

loumalouomega commented 5 years ago

I just want to comment that a easy way to increase reading speed, compress and reduce the size of the file would be to include the colors in the file, so the submodelparts can be read at the same time that the elements/nodes and the part of the submodelparts will be simplified

philbucher commented 5 years ago

@loumalouomega could you please explain, I don’t fully understand :)

roigcarlo commented 5 years ago

The problem I see with the JSON trial is that its still a custom format, we just changed the way its displayed.

The suggestion, as I understand it, goes in the direction of adopting an existing format the same way we support writing in VTK, UNV, etc.. I agree that this may not support all the features we include, hence the <information> tag from XDMF.

In this direction I also see things as @philbucher does, if GiD supports writing HDF5, we should have no problem adopting it to Kratos. Moreover, HDF5 has a C++ interface, so even if one does not want/can install h5py we should be able to provide it from Kratos.

loumalouomega commented 5 years ago

@loumalouomega could you please explain, I don’t fully understand :)

Right now the submodelparts are stored in a section of the mdpa. This section is huge and in my opinion is redundant. Using https://github.com/KratosMultiphysics/Kratos/blob/master/kratos/utilities/assign_unique_model_part_collection_tag_utility.cpp it is possible to create tags to identify all the submodelparts where an entity belongs with a simple id

nschloe commented 5 years ago

I think we should at least this time not reinvent the wheel

:+1:

We already have hdf5 support for the ModelPart! This is working for a while already but we don't use it for this purpose. I don't think there is currently a better format available.

Some clarifications: HDF5 is not a mesh format, it's a general data format, like JSON. You can store a mesh in HDF5 alright, but you'll have to specify what your field names are, in which order you store the elements etc. In other words: Create a container format that uses HDF5 internally.

Note that there are at least three mesh formats which already do that: MOAB, MED, and XDMF.

Storing a mesh "in HDF5" or "in JSON" essentially means to create a new mesh format that only Kratos uses, just like MDPA. I would not recommend that.

Ideally, Kratos would use a mesh format that is already out there and has some ecosystem support, e.g., something that can be opened by ParaView and be consumed by most software packages.

In any case we can add some minor improvements ensuring always retrocompatibility, and we can add them by stages

What you should try and avoid is to create a series of incompatible formats. People will use them, store their meshes, and still try to use them ten years from now. You'll have to support them for a long time.

i agree with Philipp that rather than further extending mdpa, we should concentrate on a more standard input model.

:+1:

My problem howevr is that it is XML based and so far we tried not to make use of XML. (i personally much prefer json format over xml)

:+1: I also prefer JSON and YAML; it's a pity they chose XML at the time, but in their defense, it was really popular then. It's not too bad either.

Some other clarifications: XDMF can use HDF5, but it can also store the data inline in binary and ASCII. If you care about being able to read a mesh file with your own eyes, then the latter would be an option. Both inline binary and ASCII don't require any packages other that Python built-in ones, so that's a plus, too.

RiccardoRossi commented 5 years ago

dear @nschloe

you are of course right about what you say about hdf5 and json (or xml).

i also agree about your comment about backward compatibility.

still using json or any of the formats frees us of the parsing work, which is already a big simplification. under this light adopting a storage format other than our own already simplifies our life.

my big problem here is that evrn though it is relatively easy to make an IO for kratos to a given file format, it is not at all a easy task to have the preprocessor (read the gui) to write intoa given format, partocularly if an external lib is needed. much less so to do this in a way that is portable across OSs. (for example we wouldn t be able to do it from gid, which is our main preprocessor)

still, i will take a look onto the formats you point out to see new ideas.

philbucher commented 5 years ago

thanks a lot @nschloe for the explanations and comments, very helpful!

I briefly took a look at your suggestions, I guess each of them has advantages/disadvantages

When I have time I will try to do some more research / make a short summary about the available formats

(@RiccardoRossi MOAB is LGPL, this is not compatible with BSD right?)

loumalouomega commented 5 years ago

(@RiccardoRossi MOAB is LGPL, this is not compatible with BSD right?)

Depends on the LGPL version. It is like MMG. You are able to generate private code, but releasing always the modifications done, and cannot be packed together (that's why you must download MMG separately)

RiccardoRossi commented 5 years ago

i don t know the details of MOAB but we cannot rely on LGPL stuff for our core behaviour

msandre commented 5 years ago

I would try to find an alternative to Xdmf. It's documentation is not great imo and I encountered some inconsistencies in implementations. It would be good to go with a format that has lots of existing tools available.

pooyan-dadvand commented 5 years ago

Thank @nschloe for your comments and suggestions. In fact, you are right in most of the points and as the writer of last 3 formats of Kratos, I would be the one to blame.

There are some history/thought/comments about how we arrive at this point and what to proceed:

mdpa was (the third) attempt as Kratos input file. At that time we wanted a human-readable/modifiable format and we were fed up by fighting against compilers with many libraries we had. So we opted for a text format without needing additional libraries to read. Some details like the ability to comment, flexible format, etc. are the result of such criteria and the fact that the input files were small at that time.
mdpa dates back to 2006 and at that time the other options were too raw to be adapted for multi-physics data.
There are many improvements pending for mdpa format. (Customized block, Geometry block, binary version, etc) but we arrive at the conclusion that we should adapt a format rather than extending more this one. (align with your suggestion)
None of the formats you mentioned are really mainstream. MOAB is not known in the industry (maybe due to its LGPL license). MED is known for Salome and reading it would be a nice feature but is not a format given by mainstream mesh generators. XDMF is is a good option considering the fact that it unifies the pre and post format for the VTK case. Neither is very popular.
Unfortunately the standard formats in the industry are the old ones (like the NASTRAN mesh) which are far from what we are looking for.
HDF5 although being popular container is very rigid and lacks important features like cross refferences
IMO we can take two approaches: Adapting a legacy one (in Ascii or HDF5) and enriching it via our ProjectParameter (which make it incompatible with standard ones) or having our own one base on some standard language like JSON. The use of other codes input formats just because it is convenient when using such preprocessor would be an interesting extension but I don't see it a core feature.

Turning back to your original comment. My question is: what is your performance issue with current modelpart io? Just to mention that reading a mdpa with several nodes blocks would be very slow due to the internal check for duplicated node ids.

loumalouomega commented 5 years ago

Today idiscovered this from OpenFOAM repo: https://github.com/ornladios/ADIOS2 (is not from them, but they are developing the IO for this format)

pooyan-dadvand commented 5 years ago

Interesting!

RiccardoRossi commented 4 years ago

@KratosMultiphysics/technical-committee considers that XDMF can be eventually used as an alternative input to Kratos. There is some ongoing work to make this happen.

However we consider that we need our own default input. We agree however that we should free ourselves from the parsing burden, for example by adopting json as a container format. With this we are not closing the way to other alternative containers.

for now we close.

Thanks all for the suggestions, please reopen if you consider that we did not address this properly

KratosMultiphysics / Kratos

file format peeves #5365