GEOS-DEV / GEOS

GEOS Simulation Framework
GNU Lesser General Public License v2.1
203 stars 80 forks source link

Level of difficulty developing in GEOSX #1105

Closed rrsettgast closed 1 year ago

rrsettgast commented 3 years ago

On multiple occasions @joshua-white has stated that others find that GEOSX is "extremely difficult to develop in", and "not at all intuitive". I would like to build a list of issues that developers have that point to specific things that we may take action on, by either modifying the code, or targeted documentation. Please feel free to share your general experiences in developing new capabilities, but also please do include specifics that can lead to improvements.

I am tagging people because there GitHub only allows 10 assignees. I would specifically like feedback from everyone on this list. @af1990 @klevzoff @AntoineMazuyer @TotoGaz @castelletto1 @cssherman @CusiniM @juliatcamargo @shabnamjs @joshua-white @andrea-borio @francoishamon @taojinllnl @wrtobin @WuHuiLLNL @huang40

andrea-franceschini commented 3 years ago

I agree that sometimes there are difficult constructs. As an example, in the contact solver I need to retrieve the subregion representing the fracture, but it turns out to be ... https://github.com/GEOSX/GEOSX/blob/0bfd6e32597dc09d0ff41a67ed644d3d95977305/src/coreComponents/physicsSolvers/multiphysics/LagrangianContactSolver.cpp#L1379-L1382

and this was quite trick for me and not completely clear even now. Moreover, the way used to retrieve the constitutive law is also quite difficult for me https://github.com/GEOSX/GEOSX/blob/0bfd6e32597dc09d0ff41a67ed644d3d95977305/src/coreComponents/physicsSolvers/multiphysics/LagrangianContactSolver.cpp#L1409-L1412

Regarding the new finite element library, I am still studying it, so, no comments right now ... maybe later.

As for vectors: sometimes it was trick to understand when to use R1Tensor, array1D, plain C array ... Now it should be easier, with just array1D and C array.

joshua-white commented 3 years ago

@rrsettgast I really regret how I came across during the meeting yesterday. I was stressed and reacted poorly to a good design suggestion. I don't think GEOSX is "extremely difficult to develop in," and I am also very conscious that to get performance we have to constantly do things in non-standard ways. I probably sounded like a jerk undervaluing all the hard work everyone has been doing to write both a performant and usable code.

A focus on new developers is healthy, though, so I like this conversation chain.

I have two suggestions:

taojinllnl commented 3 years ago

@rrsettgast @joshua-white I am probably the newest developer in this team, and I'd like to share some of my personal experiences when I use GEOSX to develop/improve a solver.

My first encounter with GEOSX source code made me feel a little bit "intimidated", which is mainly due to my own lack of experience using object oriented language to develop FEM code (my past FEM development was mainly based on procedure oriented approach). However, after doing some step through exercises with several examples, I feel that I have a much better understanding about the general structure of the code. Also, I started to put a lot of print statements in the code, which helped me to practice how to get the variables I want. Generally speaking, the biggest challenges for me come from three aspects that I can now think of: 1. the logical flow of a FEM scheme is mixed with the code for performance-purpose implementation; 2. there are not enough "in-line" comments, so I have to guess what is the purpose for blocks of code. 3. Lamda functions always give me a headache.

I can think of two suggestions:

  1. In the user menu, maybe we can strongly encourage users who plan to use GEOSX for development purposes to get their hands dirty as early as possible. We can design some small tutorials for users to do the step-through exercise themselves. The best way to learn a code is not to read the code, but to play with it. I think this type of exercise is helpful for users who are not familiar with using object oriented programming for FEM code development;

  2. I strongly agree with @joshua-white. "In-line" comments are tremendously helpful. Also, it would be even better if "in-line" comments could provide enough syntax and logical details. For example, we can explicitly distinguish which part of the code is for performance purpose, and which part of the code is the FEM algorithm itself.

joshua-white commented 3 years ago

@taojinllnl The primary point of entry for most new developers is the PhysicsSolvers. It would take work, but we could write a series of developer "exercises" walking through how to build up a simple physics solver from scratch. This would basically be a series of homework assignments that step-by-step rebuild LaplaceFEM:

If someone makes it through that, the rest of the code should be much less intimidating.

francoishamon commented 3 years ago

When coding in GEOSX, I essentially always look for code snippets implemented by you (Randy), Sergey, or Ben, and then I adapt them to what I am trying to do. I assume a lot of us are doing the same thing. To facilitate this process for new users that are not necessarily aware of that, I agree with the suggestion of having some developer exercises to make sure that we follow good examples.

To be more specific about what I find "difficult" when coding/debugging in GEOSX, here are three examples with some suggestions when possible. By no means I want to blame anybody, but I am trying to be concrete so these issues can be discussed/addressed:

  1. I find it difficult to debug or understand the pieces of code that use the keyword auto. I recently tried to debug a problem related to property import in PAMELAMeshGenerator.cpp, and I can say that I was a little bit intimidated by the autos that make the code hard to understand. I would prefer to have the type names (even if they are long and complicated) rather than auto.

  2. I often end up in the FieldSpecification code when I am debugging a new capability (to check that BCs are accounted for, etc). I find this part of the code very good and generic, but I definitely struggle with some abstract C++ lambda constructs like https://github.com/GEOSX/GEOSX/blob/0bfd6e32597dc09d0ff41a67ed644d3d95977305/src/coreComponents/managers/FieldSpecification/FieldSpecificationBase.hpp#L703 and what is above/below in this function. Maybe this file/folder would benefit from some inline comments to make the generic constructs more "concrete". For developers, it would also be useful to be able to activate a logLevel to track what is going on step by step (because many times a wrong name is passed and the field is silently ignored).

  3. One last GPU-specific point: since PR #952, a lot of the <serialPolicy> keywords in the physics solvers have been converted to <parallelDevicePolicy<> >. It seems to me that if a developer wants to add a new physics solver coupled to an existing solver already assembled on device, then the new physics solver also has to be assembled in RAJA loops to pass the Lassen integrated tests. I think this is not trivial (at least, I struggled to pass the integrated tests on Lassen). We may want to add some GPU-specific debugging tips in the documentation (are your for-loops RAJA-loops? are your variables captured properly in the kernel launch? etc)

CusiniM commented 3 years ago

I think most of the difficulties I have encountered have already been mentioned by @francoishamon and @af1990. In my opinion difficulties may arise due to the following things some of which are, however, already well explained in the documentation (in my opinion) or are being addressed:

  1. Understanding the data structure and its relation to the xml files. This was my first difficulty but I think that now the documentation has been improved a lot and it should be sufficient to have a pretty clear idea about what is going on.

  2. Accessing the data structure. Sometimes it is not entirely clear to me how to access specific objects but my solution is usually, as mentioned by @francoishamon , to look for existing pieces of code and this has worked fine for me so far. I also find a bit confusing the use of getReference, getParent , getGroup etc. I had a bit of a hard time with these calls at the beginning but many of them have now been replaced with specific getters and I think this makes the code a lot easier to use. At times I have also struggled with the concept of viewsAccessors and I think that this could be added to the developer guide since it is something that can be useful when working on a physics solver.

  3. Lambdas. Some lambda construct were indeed a bit hard also for me but, in my opinion, it's just a matter of getting used to them. The syntax is sometimes not trivial but there is plenty of examples in the code.

  4. Data types. As mentioned by @af1990 I also had some doubts about what is the correct type to be used for certain variables (mainly arrays). However, I think that the recent work on the LvArray submodule and on the physicsSolvers has made things very clear.

  5. Debugging the code. Debugging is not straightforward and it can be a bit challenging at times. I think part of it is due to the data structure itself and part of is because of the use of lambdas which debuggers do not seem to like so much. Although I also end up using quite a bit of print statements at times, spending some time to learn how to use totalView rather than gdb has really boosted my productivity. Maybe it is worth having a short tutorial explaining how to debug with totalView and suggest it as a preferred tool for geosx.

  6. Working with the output. I think that, when developing, sometimes I would like an easy way to just plot a 1D or 2D solution. I am not a huge fan of Visit which I find quite complicated to use and I certainly think that it would be nice to have a much simpler way of visualising the solution, or part of it. Using Paraview with the vtk output is certainly a lot better (at least for me) but during development it would be great to be able to just output fields in a very simple format to then generate trivial plots in gnuplot, python, matlab, etc. However, it may be that PR #830 is the solution to this.

In summary, I guess that if we had the time to come up with a simple tutorial for developers most of these things would be addressed. However, having such a tutorial would probably require quite a bit of maintenance.

AntoineMazuyer commented 3 years ago

@ValarcherL you may want to participate on it as you are maybe the newest developer in GEOSX, and you arrived with a nearly complete doc

ValarcherL commented 3 years ago

I think the majority of the problems I encountered have already been mentioned but here is what I think troubled me the most:

1 Classes Even if the instanciation of a given derivation from a Class is well documented thanks to the example in the Developer Guide(https://geosx-geosx.readthedocs-hosted.com/en/latest/docs/sphinx/developerGuide/KeyComponents/AddingNewSolver.html), the documentation on Manager/Base classes doesn't help understand the implication of those in the whole process, in addition with the way to create and register Managers. Maybe it isn't critical are those are involved only for the addition of functionalities however.

2 Overview of the process This one links a bit with the previous as the main problem is the lack on information on the ProblemManager and the calls it makes to orchestrate all the other Managers. I think a small page dedicated to it and the other Managers on the Developer Guide, could help understand how the whole Group architecture is obtained and managed. https://github.com/GEOSX/GEOSX/blob/e107210dbd9737394988695d2e6af5e8df48c623/src/coreComponents/managers/ProblemManager.cpp#L156-L172

3 Links between Group I think it is never really explained how to recover a group from another group in the code (like for example the Output from Event). In order to do this there is hopefully lots of codes that can just be pasted. It may be problematic if some people don't use this functionality just because they never really get how to implement it, given that it might be not correctly understood at first glance by a user due to the very similar name of the parameters used. https://github.com/GEOSX/GEOSX/blob/e107210dbd9737394988695d2e6af5e8df48c623/src/coreComponents/managers/Events/EventBase.cpp#L143-L156

4 Global complexity of the code Some parts of the code can be really difficult to read due to the lot of abstraction used. At some point the only clue for what the function is supposed to do is the name of it, but it doesn't really help understand all the inner mechanism. I think this problem also affect the error log when often the important information is flooded with geosx:: and other dataRepository::. For example the research of the stencil creation and the required parameters could be consequently reduced by explaining their construction and use, either with inline comment or Developer Guide.

5 Listing of all types that should be used The code uses a lot of useful aliases and classes(localIndex, globalIndex, Path...) in replacement of the usual one. It could be good to list them to help to harmonize the coding and provide useful tool to prevent new users from coding what has already been done.

6 Overview of the repository It could help to have a small map of the folders and subfolders as the location of some files is not really obvious. To go even further a link of those files and folders with the corresponding tags that will be encountered in the xml file could reduce even further the learning time.

To summarize I think that the main problem might be the lack on information on some details that might be useful for the user. However I'm not really sure that all the things I've listed will be encountered by other people as most of those occurs when trying to implement new behavior, like importing graphs. To put it in even fewer words I think the main default of the code, which may also be its biggest asset, is the hierarchized and abstract structure of the whole. On one hand it helps, as it has been said like three times before, to develop our own chunk of code by copying previous work, on the other hand it creates a really dense code that is hard to be documented effectively at 100%. In addition I think the help of Antoine and Hervé saved me a lot of time to enter in the coding phase, and even after, and it might be difficult for someone to start coding without knowing where to look at.

juliatcamargo commented 3 years ago

Hi Randy,

This is the first time that I work on such a large code. I don't think I'm the best person to give relevant feedback, but I can tell you about my short experience.

So far I've implemented and tested a constitutive model for solids in Geosx. I applied the same strategy as many mentioned before, that is I looked up existing code and adapted it. I don't think I would have managed to start from scratch and I will highlight 2 parts of the code that make me say so:

  1. When constructing the model we use something like

    DruckerPrager & cm = *(constitutiveManager.GetConstitutiveRelation<DruckerPrager>("granite"));
    cm.AllocateConstitutiveData( &disc, numQuad );
    DruckerPrager::KernelWrapper cmw = cm.createKernelWrapper();
  2. We have to define a function called DeliverClone

    DruckerPrager::DeliverClone( string const & name,
                                    Group * const parent,
                           std::unique_ptr< ConstitutiveBase > & clone ) const

    I know there is a good reason for it, but it is not clear for me what this function does specifically.

For some cases a short explanation would be sufficient to give a general idea on the importance of a function, and maybe just an in line comment would help. For other cases, I agree that tutorials are a good way to guide someone to learn the code, and I think it worth the effort of doing it. As I get more familiar with the code, I hope to contribute with more ideas here.

rrsettgast commented 3 years ago

Hello everyone. I just wanted to comment that this is great feedback. It will certainly help us chart a plan to improve the developer experience. Thank you very much!

TotoGaz commented 3 years ago

I personally think that the entry ticket for GEOSX is quite high. Which is not bad by itself: things rarely come for free and it may be the price for performance.

But that does not mean that we cannot improve here or there the codebase to make it more accessible. Also, I do not think that all the elements of GEOSX are involved in performance. For those elements, readability and maintenance could come first.

Note that

Being “intuitive” or “hard” is a relative concept. I’ve tried to provide clear examples and facts about all this nevertheless.

I could see 3 categories of situations that result in a degraded experience as per my GEOSX "developper" experience. Fortunately I think it would not be too complicated to technically fix all of them if we want to.

Loose visibility constraints (at multiple levels)

To much `public` For multiple reasons (ours or not), the visibility (`private`/`protected`/`public`) is defined too loose.

 When I was documenting some classes, I could quite often find `public` members that should become `private`. 
It is more important than it looks because when you do not know what to do with an instance, having too many (irrelevant) options reduces the signal/noise ratio and increases the entry ticket.
 Moreover, this is a very crucial question when it comes to refactoring: the more connections we have the more complicated it is. 

Action would be to reduce all of these too visible members one by one.
`Group` on top Another issue which has its effect on the entry ticket of GEOSX, and I’ve already been talkative about it (see old feedback https://github.com/GEOSX/GEOSX/pull/753) is to have exposed `dataRepository::Group` on top of most of our classes.
 It comes with ~100 `public` members functions, which is quite impressive for a new comer. 
Furthermore, using most of them any old how (_i.e._ like a new comer) will surely lead to a huge mess in GEOSX.

 As action I would strongly suggest to check what are those `public` `Group` member functions actually used outside of the `Group`/`ExecutableGroup` stuff to see what we really need on the business side.
 And then check what we can do with this.

 Those who are interested can also have a look at the extrinsic mesh data pattern that is being spread (https://github.com/GEOSX/GEOSX/pull/959 and https://github.com/GEOSX/GEOSX/pull/1090) and the way we use `Group` to serve this kind of data registration for the end developer.

Implicit exposition of internal implementation

 See PR https://github.com/GEOSX/GEOSX/pull/1071 about `Event forecast`: when reading `if( subEvent->GetForecast() <= 0 )` you must dig into the code to understand what is going on. But an equivalent `if( subEvent->isReadyForExec() )` is smoother for anyone facing the `Event` pattern for the first time.
 Adding an extra business layer adds some information at no runtime additional cost.

 Another example. If I refactor `PhysicsSolverManager::m_gravityVector`, I think (please make me a liar if not) that I will have to refactor to the user input xmls. As a developer, I think these side effects are difficult to cope with.

 There is no specific action related yet but I think we should think about it.

Slightly different IT mainstream wording and concepts, slightly off standard ways of doing

Nothing wrong in this section but comments about the “intuitive” and how to make code dev more “intuitive”.

 Sometimes it requires me some extra effort to handle some of the concepts in GEOSX because they are not exactly what some mainstream IT engineer would do. It's only my opinion.

`Pack`/`Unpack` For example, if I understand correctly, the `Pack`/`Unpack` functions are used to build convert our instances into raw binary the data to be sent over the network.
 These actions are more often named `Serialize`/`Deserialize` in mainstream IT.
 While `Packing` would more be aggregating raw data.
 Is `Pack` and `Unpack` the de facto wording for such operations in HPC? Am I just wrong?
Functional programming patterns

 Functions family like ``` template< typename T0, typename T1, typename ... CASTTYPES, typename CONTAINERTYPE, typename LAMBDA > static bool applyLambdaToContainer( CONTAINERTYPE & container, LAMBDA && lambda ) ``` basically filters containers on cast types `T0`, `T1`… and apply `lambda` on it. We implement the standard functional pattern [filter](https://en.wikipedia.org/wiki/Filter_(higher-order_function)) and some kind of [map](https://en.wikipedia.org/wiki/Map_(higher-order_function)) (with specific `void` return type). We aggregate them into a unique function, why not. Basically, providing an interface for a `Predicate p` would match exactly the well know functional programming `filter` pattern and would be the most generic interface possible. And then we could decouple `Group` from the filters the user wants to implement too. In the `c++ 20` we’ll be able to use standard `c++` `concepts` https://en.cppreference.com/w/cpp/named_req/Predicate and https://en.cppreference.com/w/cpp/concepts/predicate ; standardizing our code, proving we are in the correct direction. More standard would help new comers; and always mean less maintenance, less documentation on our side. 

I don’t quite know how not to expose `Group` outside though.
Simple `c++` iterators 

In issue https://github.com/GEOSX/GEOSX/issues/1089 there was a question about iterating on `subGroups` of a given type (say `Solvers`). Currently there is this solution ```c++ pbManager->forSubGroups([&](const SolverBase & solver) {… ``` or ```c++ this->forSubGroups< EventBase >( []( EventBase & subEvent ) { subEvent.GetTargetReferences(); } ); ``` Which may not be the easier nor intuitive way to do. I would love some _intuitive_ ```c++ for (const SolverBase & solver: pbMgr->getSolvers()) ``` or ```c++ for ( const EventBase & subEvent: this->getSubEvents() ) { subEvent.GetTargetReferences(); } ``` We could define a common smooth way to iterate on our items and generalize it as a coding rule.


Small pebbles in the shoe when developing

Direct casts after function call/loose typing Interface regularly awaits `Group` but “of a certain type”, this type being declared as the argument’s name. E.g. often functions taking `dataRepository::Group * domain` are waiting for a `DomainPartition` (see https://github.com/GEOSX/GEOSX/pull/753/files e.g.). As a developer, when I face this kind of small details, I always ask myself if I do not miss something. If there is no reason, we should avoid these situations. 
 Fixing this would make our code more robust and more intuitive too. For most of these case, we have a `class` for each concept (`DomainPartition`, `ProblemManager`)! And if not, let’s build them. Other examples: ```c++ void PoroelasticSolver::RegisterDataOnMesh( dataRepository::Group * const MeshBodies ) { for( auto & mesh : MeshBodies->GetSubGroups() ) { ElementRegionManager * const elemManager = mesh.second->group_cast< MeshBody * >()->getMeshLevel( 0 )->getElemManager(); ``` ```c++ void PoroelasticSolver::InitializePostInitialConditions_PreSubGroups( Group * const problemManager ) ``` And here an example in a lambda function, making it more complicated to deal with (`Group * subRegion `). ```c++ fsManager.Apply( time + dt, domain, "ElementRegions", FlowSolverBase::viewKeyStruct::pressureString, [&]( FieldSpecificationBase const * const fs, string const &, SortedArrayView< localIndex const > const & lset, Group * subRegion, string const & ) -> void { arrayView1d< globalIndex const > const & dofNumber = subRegion->getReference< array1d< globalIndex > >( presDofKey ); arrayView1d< integer const > const & ghostRank = subRegion->group_cast< ObjectManagerBase * >()->ghostRank(); ```
Naming convention are not respected in our whole codebase.
 Our member functions can be `group_cast`, `GetToto`, `getTutu` (underscores are quite rare). Also, you want to know the accessible attributes, you type `get` (and `Get`!) then you ask for completion? Bad luck, you were asking for `ObjectManager`: `get` is not used for getters. :wink: This gives bad taste for something easy to fix. We should define a clear rule and correct the exception everywhere. E.g. I like the rule where the first word of a function must be a verb: there must be an action and it is often quite descriptive. We should patch this and define a guardian in our CI. It appears to me far more important than indentation (see comment https://github.com/GEOSX/GEOSX/issues/1076#issuecomment-665200023)
Committing generated files (schema xsd, doc rst)

 IIUC we generate some code and we commit it as part of the source code.
 Plus we built in source tree and not in build directory.

 As a developer, it adds some noise to my `git` and I would find GEOSX more intuitive to develop in without this.
 Committing generated data is generally considered as bad practice.
28 parameters functions 

We do have functions with extremely long list of arguments.
 I understand those are special cases, but could we handle this differently?
 Could some carefully designed `c++` classes (like an enhanced version of `CompositionalVarContainer` in https://github.com/GEOSX/GEOSX/blob/develop/src/coreComponents/constitutive/fluid/MultiFluidUtils.hpp ) help with it?

TotoGaz commented 1 year ago

Some of the comments have been converted into actions in their own tickets. Closing this issue as their is no more action awaited on it.