Closed nschloe closed 4 years ago
I will tag to @pooyan-dadvand @RiccardoRossi and @KratosMultiphysics/technical-committee in general. I include @KratosMultiphysics/implementation-committee too
Developing meshio, I noticed some deficiencies in the MDPA file format
https://github.com/KratosMultiphysics/Kratos/wiki/Input-data
which make it unsuitable for efficient consumption.
In order, here they are:
* The blocks don't specify the number of nodes/elements/... up front. This way, every reader has to go through the file line by line and see if the line is `End Something`. Much more efficient would be to read the entire data set en bloc, but that's only possible if the number of items is given up front, e.g., `Begin Nodes 2415`. * Element/Node IDs and data are mixed, line by line. This makes it necessary to have a separate read for every line, singling out the first component. It would be more efficient to read a certain number of `float`s at once, then a certain number of `int`s. This could be achieved by separating the IDs and the data. * No binary data. All data is given as ASCII. I'm not sure why this decision has been made, perhaps to make debugging easier? In my experience, this has never helped too much. Binary data has the advantage of being true to machine representation, so you don't have to cough up a 16th decimal which is only half true. Also, reading is much faster if you can tell the computer to simply read the following n bytes and interpret them as m `float32`, for example.
If you want to make the format better, I'd suggest considering some of the above options.
Thanks for your suggestions.
BTW I still have pending to update the MDPA interface on the meshio
some thoughts wrong my side:
some thoughts wrong my side:
* I think we should at least this time not reinvent the wheel ( even though this is what we like to do ;) ). * We already have hdf5 support for the ModelPart! This is working for a while already but we don't use it for this purpose. I don't think there is currently a better format available.
In any case we can add some minor improvements ensuring always retrocompatibility, and we can add them by stages
In any case we can add some minor improvements ensuring always retrocompatibility, and we can add them by stages
If we find someone that is willing to invest time into this :D
as a short feedback, when we designed the format we deliberately avoided to add the number of entries at the beginning of the block, to simplify the writing. the .mdpa was a simplification of the inpit file we had before which had a parsing based on boost spirit.
i agree with Philipp that rather than further extending mdpa, we should concentrate on a more standard input model.
i am not 100% sure hdf5 is the way to go, since for example i wouldn't know how to write it from gid, and as in windows it is a less than pleasant ride to install it.
i would also consider json (or binary variants) since it removes the parsing burden from us, however this comes at extra memory cost, which may be unadmissible
i am also worried about mpi reading since we need to ensure that no single processor has the entire file in memory or parsed to a local database.
in short, i don t have a real alternative for now...but proposals would be welcome
On Thu, Aug 8, 2019, 9:18 PM Vicente Mataix Ferrándiz < notifications@github.com> wrote:
some thoughts wrong my side:
I think we should at least this time not reinvent the wheel ( even though this is what we like to do ;) ).
We already have hdf5 support for the ModelPart! This is working for a while already but we don't use it for this purpose. I don't think there is currently a better format available.
In any case we can add some minor improvements ensuring always retrocompatibility, and we can add them by stages
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/KratosMultiphysics/Kratos/issues/5365?email_source=notifications&email_token=AB5PWEIVDP6HYFHNQXKCT4DQDRWSDA5CNFSM4IKBEKS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34ULRI#issuecomment-519652805, or mute the thread https://github.com/notifications/unsubscribe-auth/AB5PWEP7FMQ7BPAYZLBZQ5DQDRWSDANCNFSM4IKBEKSQ .
I agree that we cannot always count on hdf5 since this has additional dependencies. We could use it as the high-end solution for large models (where its benefits pay of)
As a native solution we could use json, since this seems to me like the logical step since we use it extensively in Kratos already It might not be perfect in terms of speed/memory, but if it becomes important, we would go for hdf5
And ofc we will support mdpa for a long long time for backwards compatibility
@philbucher reharding hdf5 how wpuld u actually create the inpit? from which preprocessor?
@RiccardoRossi I am ofc not the expert here, but I assume that the preprocessor would need the hdf libs too. Then it should be the same procedure I would say: Open the file and write to it.
Maybe the writing can even be generalized, i.e. one only gives an object that manages the access to the file (mdpa, json, hdf) itself to the modelpart-writing-routine.
@msandre you have more experience here, what do you think?
just for curiosity, here was my experiment with json
Kratos/kratos/includes/json_io.h
it used to work, however it was based on rapidjson so i am not convinced it will still work today. anyhow it was just a first attempt (completely unilateral too) so i have no intention to defend it in any way.
regarding XDMF (following @nschloe suggestion) we should take a look. My problem howevr is that it is XML based and so far we tried not to make use of XML. (i personally much prefer json format over xml)
Yeah I know abt the json-trial
Regarding xdmf imI have the same opinion, I also prefer json over xdmf. Also I am not sure how usable xdmf is for large models. We could probably do some binary &|| compression, but then we (aka the preprocessor) again are depending on some library => this would be the same as for hdf5
If the preprocessor supports python one can use h5py (pip installable)
I just want to comment that a easy way to increase reading speed, compress and reduce the size of the file would be to include the colors in the file, so the submodelparts can be read at the same time that the elements/nodes and the part of the submodelparts will be simplified
@loumalouomega could you please explain, I don’t fully understand :)
The problem I see with the JSON trial is that its still a custom format, we just changed the way its displayed.
The suggestion, as I understand it, goes in the direction of adopting an existing format the same way we support writing in VTK, UNV, etc.. I agree that this may not support all the features we include, hence the <information>
tag from XDMF.
In this direction I also see things as @philbucher does, if GiD supports writing HDF5, we should have no problem adopting it to Kratos. Moreover, HDF5 has a C++ interface, so even if one does not want/can install h5py we should be able to provide it from Kratos.
@loumalouomega could you please explain, I don’t fully understand :)
Right now the submodelparts are stored in a section of the mdpa. This section is huge and in my opinion is redundant. Using https://github.com/KratosMultiphysics/Kratos/blob/master/kratos/utilities/assign_unique_model_part_collection_tag_utility.cpp it is possible to create tags to identify all the submodelparts where an entity belongs with a simple id
I think we should at least this time not reinvent the wheel
:+1:
We already have hdf5 support for the ModelPart! This is working for a while already but we don't use it for this purpose. I don't think there is currently a better format available.
Some clarifications: HDF5 is not a mesh format, it's a general data format, like JSON. You can store a mesh in HDF5 alright, but you'll have to specify what your field names are, in which order you store the elements etc. In other words: Create a container format that uses HDF5 internally.
Note that there are at least three mesh formats which already do that: MOAB, MED, and XDMF.
Storing a mesh "in HDF5" or "in JSON" essentially means to create a new mesh format that only Kratos uses, just like MDPA. I would not recommend that.
Ideally, Kratos would use a mesh format that is already out there and has some ecosystem support, e.g., something that can be opened by ParaView and be consumed by most software packages.
In any case we can add some minor improvements ensuring always retrocompatibility, and we can add them by stages
What you should try and avoid is to create a series of incompatible formats. People will use them, store their meshes, and still try to use them ten years from now. You'll have to support them for a long time.
i agree with Philipp that rather than further extending mdpa, we should concentrate on a more standard input model.
:+1:
My problem howevr is that it is XML based and so far we tried not to make use of XML. (i personally much prefer json format over xml)
:+1: I also prefer JSON and YAML; it's a pity they chose XML at the time, but in their defense, it was really popular then. It's not too bad either.
Some other clarifications: XDMF can use HDF5, but it can also store the data inline in binary and ASCII. If you care about being able to read a mesh file with your own eyes, then the latter would be an option. Both inline binary and ASCII don't require any packages other that Python built-in ones, so that's a plus, too.
dear @nschloe
you are of course right about what you say about hdf5 and json (or xml).
i also agree about your comment about backward compatibility.
still using json or any of the formats frees us of the parsing work, which is already a big simplification. under this light adopting a storage format other than our own already simplifies our life.
my big problem here is that evrn though it is relatively easy to make an IO for kratos to a given file format, it is not at all a easy task to have the preprocessor (read the gui) to write intoa given format, partocularly if an external lib is needed. much less so to do this in a way that is portable across OSs. (for example we wouldn t be able to do it from gid, which is our main preprocessor)
still, i will take a look onto the formats you point out to see new ideas.
thanks a lot @nschloe for the explanations and comments, very helpful!
I briefly took a look at your suggestions, I guess each of them has advantages/disadvantages
When I have time I will try to do some more research / make a short summary about the available formats
(@RiccardoRossi MOAB is LGPL, this is not compatible with BSD right?)
(@RiccardoRossi MOAB is LGPL, this is not compatible with BSD right?)
Depends on the LGPL version. It is like MMG. You are able to generate private code, but releasing always the modifications done, and cannot be packed together (that's why you must download MMG separately)
i don t know the details of MOAB but we cannot rely on LGPL stuff for our core behaviour
I would try to find an alternative to Xdmf. It's documentation is not great imo and I encountered some inconsistencies in implementations. It would be good to go with a format that has lots of existing tools available.
Thank @nschloe for your comments and suggestions. In fact, you are right in most of the points and as the writer of last 3 formats of Kratos, I would be the one to blame.
There are some history/thought/comments about how we arrive at this point and what to proceed:
Turning back to your original comment. My question is: what is your performance issue with current modelpart io? Just to mention that reading a mdpa with several nodes blocks would be very slow due to the internal check for duplicated node ids.
Today idiscovered this from OpenFOAM repo: https://github.com/ornladios/ADIOS2 (is not from them, but they are developing the IO for this format)
Interesting!
@KratosMultiphysics/technical-committee considers that XDMF can be eventually used as an alternative input to Kratos. There is some ongoing work to make this happen.
However we consider that we need our own default input. We agree however that we should free ourselves from the parsing burden, for example by adopting json as a container format. With this we are not closing the way to other alternative containers.
for now we close.
Thanks all for the suggestions, please reopen if you consider that we did not address this properly
Developing meshio, I noticed some deficiencies in the MDPA file format
https://github.com/KratosMultiphysics/Kratos/wiki/Input-data
which make it unsuitable for efficient consumption.
In order, here they are:
The blocks don't specify the number of nodes/elements/... up front. This way, every reader has to go through the file line by line and see if the line is
End Something
. Much more efficient would be to read the entire data set en bloc, but that's only possible if the number of items is given up front, e.g.,Begin Nodes 2415
.float
s at once, then a certain number ofint
s. This could be achieved by separating the IDs and the data.float32
, for example.If you want to make the format better, I'd suggest considering some of the above options.
EVEN BETTER of course would be to ditch this custom format and use one of the million existing ones. Yes, yes, I get it, they don't exactly fit your use case.
Perhaps XDMF is for you. It's well-adopted, well-written, and has an
<Information>
field which can be used for just about anything.