alexrobomind / fusionsc

FusionSC
MIT License
1 stars 1 forks source link

Standard file format to cooperate or transfer #3

Open WenyinWei opened 9 months ago

WenyinWei commented 9 months ago

Hi, Alex, I also wrote some GUI code in my jupyter notebook, qt, electron app and so on a long time ago (I guess almost three years ago). I desperately found that nobody would continue or develop beyond my code since everybody has a job to do and average people don't have curiosity about themes that do not directly belong to their PhD projects. GUI, compared to a common data format, might be less accessible and thereby less influential. So here I suppose we could and we should establish some rules for our data files like field-line tracing or diffusion and invariant manifold growth (1D manifolds for 2D maps, 2D manifolds for 3D flows, or other cases. Higher dimensions may also be interesting, so we can reserve some space and keep compatible).

Here is a list of data formats that could be standardized:

(I use trajectory/orbit for continuous/discrete-time dynamical systems, respectively)

I am considering making it easy to import these data in Blender directly or indirectly in just a few steps (a short Python script is acceptable). Unreal Engine and Unity are disregarded for the moment, which requires our clients to be too smart.

CSV is considered to be a central data format so that other formats can be transformed from or to it (N-1 connection), which is simple enough. If there is no such a centralized data format, we would suffer from N(N-1) functions to transform from and to each format. I think we should have both discrete (how to say that? scattered?) data formats like this and pack formats like your .fsc. (looks like a bit closed-source. "Don't touch me." hahahaha :) ) I am considering to support STL, OBJ or FBX (which allows its mesh changes with respect to time, but it seems to be a proprietary format or its SDK offered by Autodesk? I am not sure ) so that users can use or import them directly.

I have almost been on a state of vacation. I need some time to digest the logic of your code.

alexrobomind commented 9 months ago

Hi Wenyin,

the client-facing side of the python library (which is what I almost always recommend to use) outputs numpy arrays and should accept numpy array-likes. Python and NumPy have excellent support to write and read CSV (and other formats). Geometries can be converted to PyVista geometries, which can be saved (among other formats) as PLY (binary or ascii) and STL. Blender, Unreal, and Unity can all read those formats.

The .fsc format is definitely not designed to be attractive to most outside applications (it is powerful and very, very fast, but is heavily codependent on Cap'n'Proto), but since I expanded the data storage to be able to handle most usual python types, I find myself more and more drifting towards using it as a "one ring to rule them all" for the things I do in python.

So the question is how can we improve on what is already there? I see two obvious cases here:

Do you see any other opportunities?

alexrobomind commented 9 months ago

Addendum:

I think the best format for writing array data would definitely be NetCDF4. You get HDF5 compatibility (NetCDF4 is slightly restricted HDF5), and every remotely serious data processing environment can read it. Text formats like CSV are hideously large and slow, so I would really prefer to avoid them if possible. They are real troublemakers when the datasets grow larger. If someone really needs them, they probably need to tailor it in some way anyway (csv is hilariously under-defined).

alexrobomind commented 9 months ago

As of 512ce01c91282082de8d9a532c8d6bf8e86ef162, there is now an export module (fusionsc.export) to export fieldline trace as .mat or .netcdf. I am still working on the geometry.

Do you think we also should export to JSON? CSV is really not good enough because of the structured nature of the data.

WenyinWei commented 9 months ago

Great, Alex, how fast you did the job. Let me give it a try. Since I prefer a centralized data format than multiple supported ones which scatter our development efforts, I think exporting to NetCDF4 is enough. CSV is too convenient for small-volume Poincare trace orbit so I regularly use it before. But you are right, it doesn't work for large-volume cases, which means it's not suitable to be the centralized format. May I suggest adding transforms from NetCDF4 to geometry files like STL, PLY and FBX?

WenyinWei commented 3 months ago

Hi Alex, for a tokamak, say EAST or one unnamed under agile design progression, how to set up the .fsc file or variables like Geometry, MagneticConfig and CoilFilament independently with a rectangular grid or their data files? The external coil geometry design might frequently get changed for optimization. I would try to refactor my code before my leave to let it be well based on fusionsc.

alexrobomind commented 3 months ago

Hi Wenyin,

the .fsc files are a special optimized format that is very hard to write outside. It's not really meant to feed data into the code or out of it, more for storage inside before you export stuff again. So creating the inputs should probably be done using other ways. Once you have a field, coil, or geometry, you can save them with the .save methods of their classes, and load the file with the .load method.

For geometries, the easiest way is to load an external one is a .STL or .PLY file with Geometry.importFrom(...). That method supports many standard mesh formats through the meshio library.

For the magnetic config and the coil filaments the matter is trickier. To my knowledge, there are no standardized formats for this. I like to let users decide themselves how they wanna load the coils & fields and let them convert them in python (e.g. once they have numpy arrays they can create them using CoilFilament.fromArray and MagnetiConfig.fromComputed). I think that most researchers know python well, and probably have their own exotic formats to load from. I think that's better than to try supporting 15 formats invented by 15 people.