christopherpoole / CADMesh

A CAD file interface for GEANT4
MIT License
146 stars 66 forks source link

New Functionality and API Changes #15

Closed Supernabla closed 7 years ago

Supernabla commented 7 years ago

Hi, appreciate your work! Saved me a lot of time.

I've been working on the integration of tetrahedral-mesh geometries and nested polygon-mesh geometries with GATE (work in progress, not included yet) and started to use CADMesh for this purpose. While working with CADMesh I applied several changes:

Please, have a look at it and give me some feedback. In case you like it, I'd also be eager to change the Readme and add some explanations bout the new API.

christopherpoole commented 7 years ago

Thanks for this.

I have been working on a more comprehensive file reading solution myself, which I have just pushed to the master branch. This includes a built in lexer for tokenising text based files, which custom file parsers can be built on top of. Note that I am still working on documenting some of this, but the new built in STL file reader serves as a good example of the direction I am going in, as does thisunit test.

I will address some of the specific points you mentioned, which I may put into separate issues:

I will look at the suggested changes in more detail shortly.

janmayer commented 7 years ago

Hi, thank you all for working on this.

I plan to use CAD Files a lot more in my simulations, but I'm not sure if this is going in the right direction.

In my opinion CADMesh fulfills the role of glue code, translating

It is not feasible to merge them into a single output format without loosing the flexibility of e.g. having a G4Solid to throw around. This results in these two distinct usages (here with v1.1)

auto mesh = CADMesh((char *)GetCadFile("file").c_str());
mesh.SetScale(mm);
auto lV = new G4LogicalVolume(mesh.TessellatedMesh(), G4Material::GetMaterial("G4_Al"), "file_lV");
new G4PVPlacement(rot, G4ThreeVector(), lV, "file_pV", worldLV, false, 0, checkOverlaps);

or

auto mesh = CADMesh((char *)GetCadFile("file.ply").c_str(), (char *)"PLY", G4Material::GetMaterial("G4_Al"));
auto aV = mesh.TetrahedralMesh();
aV->MakeImprint(worldLV, trans, rot, 0, checkOverlaps);

with one ascii-ply file per component and material. (And annoying char conversions.)

I will have several cases where I use G4Solids from tessellated and native Geant4 geometries alongside and interchangeably.

I would argue that CADMesh could consist out of two classes in two header-only files - one for tessellated (accepts everything assimp accepts) and one for tetrahedral (accepts everything tetgen accepts) geometries. The user would be responsible for providing tetgen and assimp (or only one, depending on the usecase).

Imho the helper, region attributes, etc. in this commit or the lexer, reader, etc. in the master branch add too much bloat to the original, small single class codebase (v1.1: Total 172+343 lines). While support for many file types and use cases with special needs is nice in principle, there are external CAD Programs that have to be used anyhow, and exporting the things as individual ply files should be easy enough.

Hypothetical example usage:

#include "CADMesh/TesselatedMesh.h"
auto mesh = CADMesh::TesselatedMesh("file.ply", mm);
auto lV = new G4LogicalVolume(mesh.GetSolidVolume(), G4Material::GetMaterial("G4_Al"), "file_lV");
new G4PVPlacement(rot, G4ThreeVector(), lV, "file_pV", worldLV, false, 0, checkOverlaps);

or

#include "CADMesh/TetrahedralMesh.h"
auto mesh = CADMesh::TetrahedralMesh("file.ply", mm);
auto aV = mesh.GetAssemblyVolume();
aV->MakeImprint(worldLV, trans, rot, 0, checkOverlaps);

Thank you for your considerations.

Supernabla commented 7 years ago

This discussion gets going, nice! As far as I can see, Christopher's and my code bases have diverged quite a bit, so I'm closing my pull request for now...

christopherpoole commented 7 years ago

Thanks Jan. Yes I agree that CADMesh is glue between Geant4, and the assimp and tetgen geometry libraries.

Having said that, over the years I have had a lot of emails about people just wanting to use one library, or the other, or different geometry libraries entirely. What is currently on the master branch is a reflection of that.

Readers

The final goal is for a uniform interface to various libraries (not just assimp or tetgen), where the libraries/readers used can be chosen by the user at compile or run time.

Use the default reader (whatever that may be):

auto mesh = CADMesh::TessellatedMesh::FromSTL(filename); // Use he default reader.

Or specify the exact reader to use:

auto mesh = CADMesh::TessellatedMesh::FromSTL(filename, CADMesh::Reader::ASSIMP());

A fair few users just want to load an STL or PLY file into their geometry, and they don't care how that happens. However the assimp dependency is quite large, and makes that process harder than it needs to be in my opinion. Hence a few built-in readers for simple text file formats. Equally, I do agree that having built-in support for this does add code bloat. In the instance that a user just wants the assimp reader without other readers or the built-in readers, only 3 or 4 source files would be compiled:

These classes are basically a refactoring of the original code, and in your use case are essentially all that would be, and should be compiled. (I have made an issue of this here, as I do agree you should be able to build the absolute bare minimum if you wish.)

Custom file readers are also something I wish to support.

Testing

Testing is a primary concern in all of this I have described, and in future development. Previously testing of CADMesh has been performed using the limited examples. Now there is a proper setup for unit testing everything - not just the cases where reading is successful, but testing behavior on invalid input as well.

Another aspect to the built-in Lexer, and the associated parsers is that it affords the opportunity to provide the users with detailed debugging information when reading in files, see here for example where we can pick up an error in an STL file, and report the line number in the STL file where it occurs.

Headers Only

I have considered making CADMesh headers only. In the past on other projects I have found that it makes development of the tool itself much harder. The current approach is to nominate the configuration at compile time (of CADMesh), and store this configuration.

Maybe there is opportunity to automatically generate a headers only build...

Annoying char Conversions

This should be getting fixed, as G4String is (should be) used throughout CADMesh now.

Tetgen and scientific notation

Finally, this is something I have had intermittent trouble with, but I have never really nailed down the actual problem. A built-in STL file reader fixes this.

Supernabla commented 7 years ago

@janmayer, just a few comments to let you know what I agree/disagree with:

In my opinion CADMesh fulfills the role of glue code, translating

  • assimp to a G4Solid and
  • tetgen to a G4AssemblyVolume of G4Tet

I disagree in terms of TetGen, because to me it's at least unclear what implications TetGen's AGPL licensing has for the distribution of CADMesh. For any private user of CADMesh, it really doesn't matter. But if bigger collaborations want to establish CADMesh as a permanent dependency of their projects, that might be showstopping. However, IANAL, so I'd be glad if you can convince me of the opposite.

Moreover, it's not a big deal to implement a file parser for TetGen's ELE file, be it my simple line-by-line std::ifstream method or @christopherpoole's Lexer. I think it's worth the work instead of using TetGen just to parse one type of files.


It is not feasible to merge them into a single output format without loosing the flexibility of e.g. having a G4Solid to throw around.

I agree, I think the most basic read operation should always result in direct access to G4TessellatedSolid* or a smth. like a std::vector<G4Tet*> respectively. That's what I tried to achieve with my suggested API (amongst other things...). Translated to your examples:

cadmesh::PolygonMeshFileReader fileReader(mm);
G4TessellatedSolid* mesh = fileReader.ReadMesh(fPath);
// define logical etc.

or

cadmesh::TetrahedralMeshFileReader fileReader(mm);
std::vector<G4Tet*> tetMesh = fileReader.ReadTetrahedralMesh(fPath);
// establish assembly yourself

I think concerning the user interface we're on the same page.


I would argue that CADMesh could consist out of two classes in two header-only files

In a scenario where CADMesh doesn't implement a custom file parser you could definitely put everything in two header files. But as a consequence you would expose the Assimp (TetGen) dependency through these headers, i.e. including cadmesh.hh (CADMesh.hh) would always implicitly include things like assimp/Importer.hpp. Although, this is not really an obvious problem, it will add unnecessary object definitions to a global scope.


Imho the helper, region attributes, etc. in this commit or the lexer, reader, etc. in the master branch add too much bloat to the original, small single class codebase (v1.1: Total 172+343 lines).

Yeah, that'is probably true. Looking at @christopherpoole 's code I thought the same. Now, looking at my code again, I also feel like it implements stuff which doesn't belong to CADMesh, but rather to an application...

Supernabla commented 7 years ago

Some Brainstorming on the 'Master' branch

Is the CRTP base class stuff and TetrahedralMesh or TessellatedMesh really necessary? Some of the data members TetrahedralMesh and TessellatedMesh share are redundant, e.g. offset_. Others logically belong to the reader classes, e.g. scale_. In the sense of: the file reader should know how to interpret the unit of length.

Let's go back to what the persistent data is, on the CADMesh/Geant4 side:

Why not provide the reader classes with G4double scale and implement the Read() methods to generate this data, e.g.

Note, that here, the reader objects are specific to the output data they produce, while the capability of reading different file types is encapsulated in the Read() method.


That being said, there are mainly three point's I don't understand:

  1. Why are TetrahedralMesh or TessellatedMesh necessary as another wrapper around the reader classes?
  2. Why do you want the reader classes to be specific to the data type they can handle, instead of the output data they produce?
  3. Why is it necessary for the reader classes to keep shared ownership of the data they produce?

I always think of it being analogue to reading an image. You usually have a reader that can implicitly handle different file types, e.g. .png, .jpeg, .tiff, etc. And the reader would usually read the data into a buffer and hand over the ownership.


That's it for today. I hope this discussion helps! Sorry for being so insistent...

christopherpoole commented 7 years ago

Yep there are probably redundant methods and members, as the master branch is currently under development.

STL file and PLY files for example don't encode units. It is up to the user to specify the expected units at the time they place the volume, or just before.

Any custom reader derived from Reader just needs to implement the method G4bool Reader::Read(G4String filepath) and Reader::CanRead(File::Type file_type). In the Read method it simply has to call the method AddMesh(Mesh::New(triangles, name)) as many times as required. Everything else, including conversion to a tesellated solid or assembly is handled by CADMesh. Look at the ASSIMPReader as an example. It is trivially easy this way to add new readers, whether they be custom, or wrap another library, and as all readers share a base class, they can be interchanged at run time.

For your points:

  1. I have the need to select different reader backends at both compile time, and run time. Along with my need for custom readers.
  2. For CADMesh, readers just get CAD files into G4TessellatedSolids (or assembly volumes in the case of assemblies of meshes or tetrahedra) - there is only one kind of output. Different readers support different CAD file formats. If the reader you select can't read the file you have chosen, that is a fatal exception at run time, not necessarily at compile time. Also by using something like TessellatedMesh::FromPLY, that is very explicit as to what input file is expected to be provided, the output is always the same. Which reader is used might not matter, but can be specified optionally.
  3. The ASSIMP reader for example requires the importer stick around for the life of any object that it has imported. By copying data into the Mesh object, we can get rid of the importer earlier.

Note that the master branch is under development, so the current code may not reflect what I have said here exactly, which might be different to what I have in my local branch. I have also produced much more documentation to go along with all of this, although I haven't finished or pushed it yet.

janmayer commented 7 years ago

Well, that escalated quickly.

However the assimp dependency is quite large, and makes that process harder than it needs to be in my opinion.

I'm not sure what you mean here. Assimp seems to be in active development, compiles, and can even be installed with package managers (yum list *assimp*: assimp.x86_64, assimp-devel.x86_64, python2-assimp.noarch).

or different geometry libraries entirely.

May I ask which ones and why?

Tetgen

There seem to be some problems with the tetgen library, e.g. namespace pollution, doesn't read files correctly, doesn't read binary files at all (super annoying), license issues, etc.

Pulling out my crystal ball of predicting the future, I would guess that a re-implementation would be quite beneficial. I've only glanced over the code, 31k lines of pure C, and have no idea how hard it is to actually do the tetrahedralization. Not reading all the different file types yourself, not implementing all geometry operations from scratch, and not using C should lower that by at least an order of magnitude.

Looking at assimp

... designed to load various 3d file formats and convert them into a shared, in-memory format. It supports more than 30 file formats for import ...

The shared, in-memory format looks like a nice place to piggyback a tetrahedralization class and other readers onto. That would then result in the return to one class only: Read file into assimp; GetTesselatedSolid: runs through assimp; GetTetAssembly: runs through assimp +tetrahedralization class.

Maybe the assimp guys would even be interested in this - or maybe have something usable in the pipeline

(Note: I've never worked with assimp and have not looked at their code. Swap it with whatever library works.)

Supernabla commented 7 years ago

Well, that escalated quickly.

Vivid discussion can only help!

[...] and have no idea how hard it is to actually do the tetrahedralization.

Pretty non-trivial for anyone without a degree in applied maths. Re-implementation would be a hell of a task. In a mathematical sense TetGen is solid work as far as I heard and as far as I can judge. I'm still a satisfied user of TetGen and promote its use for tetrahedral mesh generation as an independent tool (I don't like either, that it doesn't read binary files...).

The crucial question is: Should CADMesh be able to generate tetrahedral meshes at run time? If so, TetGen is definitely necessary. If not, CADMesh is well of with file reading capability only.


Concerning other geometry libraries, even though you might know them already:

The MOAB library is a unified interface for unstructured meshes. A connected project, DAGMC, implements the use of unstructured mesh geometries (and octree navigation!) in Monte Carlo applications and even has limited Geant4 support (DAGSolid).

Maybe MOAB could also be interesting for CADMesh.

Cheers

janmayer commented 7 years ago

I did some more research on importing CAD files in Geant4 and ROOT. There is a lot of disarray, many half-finished-and-abandoned projects, or commercial solutions (mentioned in your paper).

There is a lot of stuff focused around GDML. Some working with Step files to create gdml with basic shapes, e.g. from FreeCAD or Catia, or using old OpenCascade stuff, One interesting thing I found was a direct converter from stl to gdml with tessellated solids, which is (maybe? probably?) equivalent to the tessellated part of cadmesh. (Replacing the assimp dependency with gdml. Pest or Cholera 😉.)

I also work with projects that unfortunately use CERN ROOT geometries with Virtual Monte Carlo. ROOT can import GDML, but does not support tessellated geometries at all. 😭 I've adopted the CADMesh Idea with tetgen for ROOT in https://github.com/janmayer/TCadToGeo. The name will probably change to something less general highlighting the tetrahedral nature some more.


The crucial question is: Should CADMesh be able to generate tetrahedral meshes at run time? If so, TetGen is definitely necessary. If not, CADMesh is well of with file reading capability only. That is indeed an important decision.

From developer perspective, dropping the tetgen dependency means:

For the user:

From this quick look I would probably go with the reader only.

I think there is room for a tiny package (one class) that does nothing but reads tetgen output and produces Geant4 Assembly Volumes. Maybe one could even get that pulled into the Geant4 source code itself.