MassimoCimmino / pygfunction

An open-source toolbox for the evaluation of thermal response factors (g-functions) of geothermal borehole fields.
BSD 3-Clause "New" or "Revised" License
46 stars 20 forks source link

Code health and technical debt #257

Closed ikijano closed 8 months ago

ikijano commented 1 year ago

Hi,

I have following this awesome project some time. The place where I'm working we use pre-calculated g-functions and I studied could we utilize this library for custom cases, but sadly it's not case in long time.

When I first time looked the pygfunction code I was slightly shocked because there is some quite a huge files and functions. I'm not python developer but I have used language in past, but I had hard time to follow the code and I had to give up. For curiosity I ran code analysis using codescene.io community edition against pygfunction (forked) code base: https://codescene.io/projects/33542/jobs/790698/results and results are interesting.

I see there is already raised issues to enhancement code quality:

Issues shows similar findings what analysis tool found but there is some other aspects what should be considered: https://codescene.io/projects/33542/jobs/790698/results/warnings/code-health-degradation

I would like to help but my python knowledge is not good enough and my understanding the subject is very limited.

j-c-cook commented 1 year ago

@ikijano Thank you for your constructive criticism.

This package implements some serious theoretical physical models. Looking at this code base and considering it complex is natural, I believe. The complexity and sheer depth of knowledge that is located in this package is impressive. I implemented one small piece of pygfunction in C++ a couple years ago, and the task was no walk in the park.

On top of the math expertise required here, the numpy package has been extensively utilized for computational speed. Therefore, a Python developer well versed in the native object containers could come here and need to educate himself/herself on NumPy lingo. The utilization of numpy here is impressive.

If you want to dive into the implementation, I encourage you to pick one aspect at a time. Small pieces here are feasible to be understood, but all at one time with no aim would likely not result in anything meaningful (i.e. is there existence without limitation?). Perhaps there is a need for a certain kind of documentation along with this package. Please do not hesitate to create new issues with a question or to start a discussion at any time. Feedback is really helpful to help better tailor this package for the next person.

We can always strive to have a more human readable implementation. The implementation vs. feature is a balancing act that is crucial. I say the balance is analogous to a physical foundation.

As a response to the code analysis you presented, I've proposed #258. Moving to a Python code formatter like black may be an improvement (though I really think the name of that package is horribly chosen). I also think we could implement some static checks in the build process using mypy.

ikijano commented 1 year ago

Thank you @j-c-cook. I don't expect these kind things to happen immediately it more like ultimate goal to make code also more readable and clean. I know sometimes it's necessary sacrifice readability to make code perform better.

I think there are some areas where it's relatively easy to improve code quality: Separating concerns like moving all visualization functionality away from core module to to own module and also trying to keep functions/methods/classes do one thing.

I'm glad there is tests in this project so any refactoring should be realive safe to do.

I would say doing something like #258 could improve quality if it done correctly. In my experience keeping same if-else structure in multiple places makes code harder to read and maintain.

MassimoCimmino commented 1 year ago

Thank you @ikijano. The codescene tool seems very useful. I have some reading to do to correctly interpret the output.

I see here that the gfunction module is perhaps the most problematic, and I would agree with that. Instead of moving the visualization out of the module (I would argue they belong inside the gFunction class), I suggest we explore the possibility of moving the solvers to a separate module.

To follow up on @j-c-cook's comment, methods implemented into pygfunction are very recent, and in the case of my own methods they are implemented at the same time as they are developed (i.e. as the research is conducted and the scientific paper written). This has the advantage of bringing the functionality very quickly to the community. The obvious disadvantage is that it is a "first" implementation of the methods and they aren't always optimal both in readability and efficiency. We do revisit our implementations as we get more experience, as the project grows, and in response to (always welcome) feedback such as yours.

The refactoring we did to the gfunction module in v2.x is pretty much on par with my current ability as a Python developer but I am always interested in improving. I would appreciate if you had any specific changes in mind or documentation to read to go along with your concerns.

MassimoCimmino commented 8 months ago

It seems we couldn't arrive to any specific tasks. We will re-open this issue later if needed.