COSIMA / om3-utils

Mozilla Public License 2.0
0 stars 2 forks source link

Package structure and scope #9

Open micaeljtoliveira opened 6 months ago

micaeljtoliveira commented 6 months ago

This looks great @micaeljtoliveira. My first thought is that these tools are sufficiently general and useful beyond OM3 to justify moving them into their own package. Something to do now, later or never?

Since you asked for it, here's an attempt at a proposed restructure that I think makes things clearer for users and developers. Of course, this is all a matter of opinion so feel free to ignore all or parts of it:

om3utils/
├── __init__.py
├── config_file
│   ├── __init__.py # import all read_*.py and write_*.py functions
│   ├── mom6_input.py
│   ├── nuopc_config.py
│   └── payu_config_yaml.py
├── profiling
│   ├── __init__.py
│   ├── parser
│   │   ├── __init__.py # import ESMFProfilingParser and FMSProfilingParser
│   │   ├── base.py # contains ProfilingParser from profiling.py
│   │   ├── esmf.py # contains all of esmf_profiling.py and esmf_trace.py
│   │   └── fms.py # contains all of fms_profiling.py
│   └── analysis.py # contains all of profiling_analyses.py and parse_profiling_data from profiling.py
└── utils.py

Then the profiling tools could be used as follows:

from om3utils.profiling.parser import ESMFProfilingParser
from om3utils.profiling.analysis import (
    parse_profiling_data, 
    scaling_efficiency
)

stats = parse_profiling_data(
    run_dirs,
    ESMFProfilingParser,
    varname,
    getvar
)
eff = scaling_efficiency(stats)

You could even import all the parsers and analysis functions in profiling.__init__.py so that one does not need to go through the parser and analysis subpackages.

(Note I'm not really sold on the name of the config_file subpackage)

Originally posted by @dougiesquire in https://github.com/COSIMA/om3-utils/pull/3#pullrequestreview-2051568344

micaeljtoliveira commented 6 months ago

@dougiesquire Thanks again for the comments!

My first thought is that these tools are sufficiently general and useful beyond OM3 to justify moving them into their own package. Something to do now, later or never?

Yes, definitely more general than OM3, both the profiling and the config files. So at least we should consider renaming it to something like access-utils or access-pyutils? (This is actually a more general question: we are developing lots of ancillary code and tools for the ACCESS models. How should these be organized? Should this be discussed at a higher level?)

About splitting it up, I'm a bit hesitant. I see the value of having more modular and more focused packages, but on the other hand having lots of small packages increases the maintenance burden and dependency hell for users. Not easy to find the right balance... Currently there are less than 1000 lines of python in this repo (tests excluded), so maybe we can leave as is for now?

Since you asked for it, here's an attempt at a proposed restructure that I think makes things clearer for users and developers.

I like your proposal, but seems to go against recommended best practices? Although 3 levels is probably still okay?

You could even import all the parsers and analysis functions in profiling.init.py so that one does not need to go through the parser and analysis subpackages.

This would reduce the nesting by one level, right? So I would consider it a must have.

dougiesquire commented 6 months ago

Yes, definitely more general than OM3, both the profiling and the config files. So at least we should consider renaming it to something like access-utils or access-pyutils? (This is actually a more general question: we are developing lots of ancillary code and tools for the ACCESS models. How should these be organized? Should this be discussed at a higher level?)

About splitting it up, I'm a bit hesitant. I see the value of having more modular and more focused packages, but on the other hand having lots of small packages increases the maintenance burden and dependency hell for users. Not easy to find the right balance... Currently there are less than 1000 lines of python in this repo (tests excluded), so maybe we can leave as is for now?

Yeah, I think we probably need to discuss this more broadly. I like access-utils. I'm not sure the py adds much, especially since there may end up being a CLI.

I like your proposal, but seems to go against recommended best practices? Although 3 levels is probably still okay?

Yeah, I probably tend to overdo the level of nesting. In this project in particular, there's a tendency towards more nesting because the scope is quite broad and it's helpful (to me at least) to have namespaces for the different "functionalities". We could do away with the parser subpackage? E.g.

om3utils/
├── __init__.py
├── config_file
│   ├── __init__.py # import all read_*.py and write_*.py functions
│   ├── mom6_input.py
│   ├── nuopc_config.py
│   └── payu_config_yaml.py
├── profiling
│   ├── __init__.py # import Parsers and analysis functions
│   ├── base.py # contains ProfilingParser from profiling.py
│   ├── esmf.py # contains all of esmf_profiling.py and esmf_trace.py
│   ├── fms.py # contains all of fms_profiling.py
│   └── analysis.py # contains all of profiling_analyses.py and parse_profiling_data from profiling.py
└── utils.py

Or even put all the ProfilingParsers in their own submodule (though this may not scale very well...)?

om3utils/
├── __init__.py
├── config_file
│   ├── __init__.py # import all read_*.py and write_*.py functions
│   ├── mom6_input.py
│   ├── nuopc_config.py
│   └── payu_config_yaml.py
├── profiling
│   ├── __init__.py # import Parsers and analysis functions
│   ├── parser.py # contains ProfilingParser from profiling.py, all of esmf_profiling.py and esmf_trace.py and all of fms_profiling.py
│   └── analysis.py # contains all of profiling_analyses.py and parse_profiling_data from profiling.py
└── utils.py

Please note I don't feel strongly about any of this.

PS. @aidanheerdegen may have thoughts and opinions about all of this as he has interest in at least the profiling tools.

micaeljtoliveira commented 6 months ago

In this project in particular, there's a tendency towards more nesting because the scope is quite broad and it's helpful (to me at least) to have namespaces for the different "functionalities".

If we broaden the scope beyond OM3, then I think it would made sense for the top-level namespace to be an "organization" namespace, with functionalities as second level:

access/
├── __init__.py
├── config_files
│   ├── __init__.py # import all read_*.py and write_*.py functions
│   ├── mom6_input.py
│   ├── nuopc_config.py
│   └── payu_config_yaml.py
├── profiling
│   ├── __init__.py # import Parsers and analysis functions
│   ├── base.py # contains ProfilingParser from profiling.py
│   ├── esmf.py # contains all of esmf_profiling.py and esmf_trace.py
│   ├── fms.py # contains all of fms_profiling.py
│   └── analysis.py # contains all of profiling_analyses.py and parse_profiling_data from profiling.py
└── utils.py

Then one could do:

from access.profiling import ESMFProfilingParser
from access.profiling import (
    parse_profiling_data, 
    scaling_efficiency
)

stats = parse_profiling_data(
    run_dirs,
    ESMFProfilingParser,
    varname,
    getvar
)
eff = scaling_efficiency(stats)

One thing I like with this approach is that it's easy to split the package later on without changing the namespace. It would also make it easy to add existing packages to the access namespace, like for example esmgrids, if we want to.

aidanheerdegen commented 6 months ago

Would welcome any thoughts you might have @jo-basevi or @truth-quark