ESMCI / cime

Common Infrastructure for Modeling the Earth
http://esmci.github.io/cime
Other
162 stars 207 forks source link

CIME refactor #4528

Open jasonb5 opened 11 months ago

jasonb5 commented 11 months ago

After extensive analysis of CIME’s codebase using the prospector tool, it is my recommendation that the code is in need of a refactor. With this, CIME’s codebase can be simplified, improve error handling, readability and maintainability, increase unit test coverage, remove any remaining model specific code, and convert CIME to a standard package/release cycle.

The prospector tool was used to perform the analysis, it analyzes Python code and outputs information about errors, potential problems, convention violations and complexity. The tool found 2,587 instances when scanning the CIME codebase. There were 115 instances of code being marked as complex according to the cyclomatic complexity.

The first goal of the refactor is to reduce the complexity of CIME’s codebase, this will significantly improve the readability and maintainability of the code. This first step will be to remove any unused code, followed by reducing the complexity of functions identified by the prospector tool. While the code is being refactored unit tests will be added to ensure the correct function. Currently we have 67% test coverage; 36% coverage from unit tests and 31% from end-to-end tests, I would like to get this at least above 80%. Some unit tests need to be fixed as they rely on previous test output, this is not a good practice as it makes running some individual tests impossible. Fixing these tests will allow the test suite to run individual tests in parallel, decreasing the testing time and better utilizing the resources from our automated testing. Next is to improve error handling within CIME, there are many instances where generic builtin python errors are raised and not caught leaving users unaware of the true cause. After improving error handling, there is still some model specific code that went unaddressed in the first pass and will be removed. Lastly the user facing code will be cleaned up so CIME can be used like a standard 3rd party package and distributed through PyPi and Conda. Projects will still be able to use CIME as a submodule but have an installable option. With releasing a more standardized package there will be more control in what features go out to the user and enhance CIME by providing the ability to utilize it within the python ecosystem.

jasonb5 commented 11 months ago

@jedwards4b @billsacks I'm looking to refactor CIME per the description above, I'd like to open this up for some discussion and get CESM's input.

jedwards4b commented 11 months ago

I am all for this in principle but am a little worried about it in practice. Do you think that this can proceed incrementally or would it be better to create a development branch in the repository? Should we go back to having regular cime meetings to discuss progress and issues? I would suggest an initial meeting, perhaps Wednesday or Thursday this week to discuss further.

ekluzek commented 11 months ago

@jasonb5 this tool you mention above sounds very useful, and very useful for other python projects we work on. Is prospector easy enough to use that I can just Google about it? Or could you give us a few notes on how to use it? You could just give a few notes here on how you used it for cime. Thanks for any insights you have. And thanks for posting about this.

Also agree with @jedwards4b that there should be some discussions on this.

jasonb5 commented 11 months ago

@jedwards4b This can all be done incrementally. To ensure CIME keeps functioning, I'll be filling in unit tests for each portion of code before I refactor, the goal is no disruptions to projects using it. I'll be tackling this all in small chunks so it shouldn't hold up any development outside of it either.

@rljacob @jgfouca I'm fine with restarting the regular CIME meetings, to discuss progress and issues.

@ekluzek Here's the prospector GitHub (https://github.com/landscapeio/prospector) and docs (https://prospector.landscape.io/en/master/). I've provided the .prospector.yml file I used when using the tool, with that it's just a matter of pointing the tool at CIME's code.

strictness: high
test-warnings: true
doc-warnings: false

ignore-paths:
  - doc

vulture:
  run: true

pylint:
  disable:
    - I
    - C
    - R
    - logging-not-lazy
    - wildcard-import
    - unused-wildcard-import
    - fixme
    - broad-except
    - bare-except
    - eval-used
    - exec-used
    - global-statement
    - logging-format-interpolation
    - no-name-in-module
    - arguments-renamed
    - unspecified-encoding
    - protected-access
    - import-error
    - no-member
gold2718 commented 11 months ago

@jasonb5, I am also all for this refactor, it sounds great. As a representative of another model using CIME, NorESM, I would like to ask that as plans become solidified, this issue is updated so that we can follow along (for those of us who are likely to have trouble attending the meetings).

jedwards4b commented 11 months ago

@gold2718 We've scheduled a meeting for 1pm Mtn on thursday - I know that's pretty late for you but would you like an invite?

gold2718 commented 11 months ago

@gold2718 We've scheduled a meeting for 1pm Mtn on thursday - I know that's pretty late for you but would you like an invite?

Yes please, I'll ask if I can stay up late :wink:

mvertens commented 11 months ago

I'd also like to participate. I think this is a great idea.

billsacks commented 11 months ago

This sounds great - thanks a lot, @jasonb5 !

I wanted to share a thought I've had about refactoring CIME. This would probably involve significantly more rework than what you're discussing here, so may be outside the scope of this issue, but it could be good to have an eye towards this if others agree that it's worth doing: There's a lot of functionality in CIME that would be useful to other projects but isn't easily used by other projects unless the adopt the entire Case Control System. Examples of this useful functionality are the abstraction of the queuing system and the management of inputdata. I've wanted to be able to use this kind of functionality in other tools (e.g., supporting tools for CTSM that are outside of the Case Control System usage), and I've been talking with Christina Holt (who leads the Unified Workflow project for NOAA's UFS) about the possibility of collaborating on tools like this - but they don't want the entire Case Control System. So I've been wondering if it would be feasible to refactor some of the CIME functionality to make it more easily usable in isolation.

Again, I'm not suggesting that you fold that into the work you're discussing here, but I do think it could be good to discuss some other big picture things so that any refactoring effort might be able to move us incrementally closer to where we want to be long-term.

jasonb5 commented 11 months ago

@billsacks I wouldn't be opposed to adding this as goal. I'll take a look and see how much effort it will take.

jedwards4b commented 11 months ago

I believe that CIME has a well defined but poorly documented API that applications outside of cime can use. For example here is an application that builds the model and sets up a particular ensemble configuration. How can we clarify and improve this API? As much as I try to advocate for it I find that there are very few people who use it.

jgfouca commented 11 months ago

Previous attempt at a big CIME refactor: https://github.com/ESMCI/cime/issues/3886 . Many of the items were implemented, some were not.

github-actions[bot] commented 8 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

jasonb5 commented 8 months ago

Here's a list of tasks for the refactor, feel free to propose additional items. I'll begin working on these between higher priority items.

Tasks are ordered by priority, highest at the top.

jgfouca commented 8 months ago

@jasonb5 , looks like a great list.

Improve argparse for all tools Review descriptions Group arguments

Does this include using **vars so that we don't have to fully list args everywhere?

jasonb5 commented 8 months ago

@jgfouca Yes I'll address all the areas that could take advantage of * and **.