Closed TonyB9000 closed 2 months ago
The update looks good to me. Thank you for addressing the review comment.
OK, I will wait for Tom's review next (and continue testing with the dsm_generate_CMIP6.sh driver).
OK, I will wait for Tom's review next (and continue testing with the dsm_generate_CMIP6.sh driver).
Can you paste what the log output looked like before and after this PR?
I'm aware that you updated the logger message outputs to minimize verbosity. However, I suggest keeping at least the module name and the exact line of code because it helps immensely with debugging.
@tomvothecoder For the most part, I did not change the log-output content. What I did was reduce the number of different "logger-setup" functions being called (now only the one in "_logger.py" is used, and named it "e2c_logger" to avoid conflicts.
The log output itself has not (much) changed. But where the messages get logged (default logfile, console stream, etc) may have, due to changes in "propagate", and which handlers get registered with the logger, etc. CMOR (via "cmor.setup") establishes its own handlers into the logger, and I cannot but control the name of the log-file ("cmor_logs").
CMOR (via "cmor.setup") establishes its own handlers into the logger, and I cannot but control the name of the log-file ("cmor_logs").
I created a new issue to explore including CMOR logs with the main e3sm_to_cmip
log file (and removing CMOR logs if it is redundant). https://github.com/E3SM-Project/e3sm_to_cmip/issues/266
From what I've seem, the "cmor_log" messages involve issues of CMIP6 conformance in the metadata. I tend to prefer to keep those (glaring) messages separate from the regular e2c messages - they hurt my eyes and are not amenable to machine-read. I have asked that they (CMOR) provide a fuller API interface to their log setup/formatting... just hoping...
From what I've seem, the "cmor_log" messages involve issues of CMIP6 conformance in the metadata. I tend to prefer to keep those (glaring) messages separate from the regular e2c messages - they hurt my eyes and are not amenable to machine-read. I have asked that they (CMOR) provide a fuller API interface to their log setup/formatting... just hoping...
Yes I agree, the CMOR log messages aren't formatted well. I think it is a good idea to combine the e3sm_to_cmip and CMOR logs because it is easier to debug a single log file. Currently, when e3sm_to_cmip
crashes, you have to parse separate log files to see if the error is due to CMOR and what the CMOR error is (most of the time unclear). We can still keep separate CMOR logs by variable too.
@tomvothecoder I think I covered most of the comments, save for a few. But I made the changes locally and cannot push them until I figure out the right way to do it.
@tomvothecoder @chengzhuzhang
I was finally able to push through the "merge conflicts", and address most of the recommendations and comments Tom made. There are still some residual issues. Hopefully these can now stand-out and be more carefully addressed.
I will need to do more testing before any merge-to-master occurs, but at least the major changes are now secured.
@tomvothecoder Not sure where to put this comment . . .
Regarding the "cmor_logs" output being combined with other log output, I am not sure how to make that happen.
The new "_logger" (if "to_logfile" is requested) will add the handler to write to the default log (e3c_logs) unless a different logfile is specified.
However, the handler.py "setup_cmor_module()" invokes "cmor.setup( ..., logfilename)" directly, and cannot access the _logger function nor learn its default logfile. It assigns a logfile logpath = os.path.join(cwd, "cmor_logs")
if it does not inherit a logfile passed in "from above" via the parent function "setup_cmor_module()", which would receive the logfile from its parent function "cmorize()", which is itself obtained (obtusely!) from its parent function call "do_dict()", a method attached to (utils.py: VarHandler()) as in "VarHandler( . . . ).to_dict(), itself called from _get_handlers_from_yaml(), called in "_get_handlers_by_var(), which is itself called by BOTH "load_all_handlers()" and "derive_handlers()" (each of which is called in main: _get_handlers() under different conditions).
If you know the intended way to pass the default (time-stamped) logfile down through that call chain, I would be happy to apply it. I could try to pass "logfile=DEFAULT_LOG" into the main class at "self.handlers = self._get_handlers()", but if that logfile is overridden, there is no opportunity to learn the override unless it is somehow made global in _logger.py
Perhaps I should add this comment to issue #266?
@tomvothecoder I think I've addressed each of the changes you've requested - plus added "feature issues" you had suggested. If you would like to re-review, or not, let me know.
Perhaps I should add this comment to issue #266?
My comment about combining logs was a separate suggestion and should not be included in this PR. You can move your comment above to #266.
@tomvothecoder I think I've addressed each of the changes you've requested - plus added "feature issues" you had suggested. If you would like to re-review, or not, let me know.
I will do a re-review of this PR. There are also some failing unit tests that need to be fixed.
Does this PR need to be merged any time soon? I'm assuming this is a nice-to-have enhancement and not a high priority/blocker, which means we can take our time reviewing. Correct?
@tomvothecoder There is no rush as far as I am concerned - take your time.
And of course, let me know where there are things I have neglected to address.
@tomvothecoder Question: If I have accepted a pile of "commit changes" here on the web, but also want to edit and push larger changes from my local repo, how do I avoid clobbering one or the other? Can I "pull" from my remote PR to update my local repo first, before advancing new edits?
NVMD - looks like "git pull" worked to fast-forward.
@tomvothecoder This should be VERY close now, to address your comments. I need to add the issue for "num_sucesses" vs "num_failures" in main.py, as this affects the conditions of exit. It may be necessary to make it another command-line option (exit on any failure, or return non-zero status on any failure, etc).
@tomvothecoder I think I've addressed all of your comments. Could you re-review the changes? Thanks!
Thank you @TonyB9000. I'm doing another review right now.
A few notes:
main
vs. with this PR? I would like to see what the log output looks like without needing to run e3sm_to_cmip on this branch. For future pull requests, can you make sure to split your changes into smaller, related changes? Combining many changes (related or unrelated) in a pull request makes it harder to review and test. The CMOR setups and logger changes should have been in separate PRs. Other example of changes that should have been in separate PRs:
delete_tempfiles
flag in mpas.py
do_pbar
command line flagglobal logger
from Python will be, if any. From my experience, I have not needed to do this nor have I seen it commonly done in Python packages (is it good practice?). If it seems to be working for e3sm_to_cmip
for our use cases (no repeated log outputs, centralized logging), then I'm fine with it. This is just a forewarning in case something happens or it becomes unmaintainable/problematic.3. I'm not sure what the long-term implications of referencing/overriding the
global logger
from Python will be, if any. From my experience, I have not needed to do this nor have I seen it commonly done in Python packages (is it good practice?). If it seems to be working fore3sm_to_cmip
for our use cases (no repeated log outputs, centralized logging), then I'm fine with it. This is just a forewarning in case something happens or it becomes unmaintainable/problematic.
Adding to my comment above, mypy
caught all of these issues related to the logger
variable. I'm somewhat concerned with whether referencing the global logger
is good practice (post about this). Please run pre-commit install
in your e3sm_to_cmip repo and run pre-commit --run all-files
to see these issues too.
e3sm_to_cmip/_logger.py:45: error: Cannot resolve name "logger" (possible cyclic definition) [misc]
e3sm_to_cmip/_logger.py:45: note: Recursive types are not allowed at function scope
e3sm_to_cmip/util.py:24: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:502: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:512: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:528: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:552: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:561: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:571: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:645: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:649: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/util.py:662: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/cmor_handlers/handler.py:218: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:248: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:256: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:279: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:300: error: Item "None" of "Any | None" has no attribute "error" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:342: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:684: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:688: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:698: error: Item "None" of "Any | None" has no attribute "error" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:701: error: Item "None" of "Any | None" has no attribute "info" [union-attr]
e3sm_to_cmip/cmor_handlers/handler.py:711: error: Item "None" of "Any | None" has no attribute "error" [union-attr]
e3sm_to_cmip/cmor_handlers/utils.py:22: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/cmor_handlers/utils.py:22: error: Unexpected keyword argument "set_log_level" for "_logger"; did you mean "log_level"? [call-arg]
e3sm_to_cmip/cmor_handlers/utils.py:83: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/cmor_handlers/utils.py:125: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/cmor_handlers/utils.py:221: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/cmor_handlers/utils.py:227: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/_logger.py:10: note: "_logger" defined here
e3sm_to_cmip/__main__.py:110: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:153: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:154: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:155: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:156: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:157: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:158: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:159: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:160: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:161: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:213: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:228: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:229: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:230: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:232: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:238: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:656: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:657: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:658: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:674: error: Unused "type: ignore" comment [unused-ignore]
e3sm_to_cmip/__main__.py:700: error: Incompatible types in assignment (expression has type "dict[str, Any]", variable has type "str | None") [assignment]
e3sm_to_cmip/__main__.py:708: error: Unsupported target for indexed assignment ("None") [index]
e3sm_to_cmip/__main__.py:710: error: Unsupported target for indexed assignment ("None") [index]
e3sm_to_cmip/__main__.py:716: error: Unused "type: ignore" comment [unused-ignore]
e3sm_to_cmip/__main__.py:752: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:772: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:773: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:774: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:836: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:861: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:865: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:878: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:970: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:972: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:985: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:989: error: Name "logger" is not defined [name-defined]
e3sm_to_cmip/__main__.py:990: error: Name "logger" is not defined [name-defined]
Found 65 errors in 5 files (checked 6 source files)
I think an easier way to achieve a single global logger is to instantiate the logger once in _logger.py
then import it to the various other modules.
Example (source):
You could make your own logging "module" which instantiates the logger, than have all of your code import that instead. Think:
logger.py:
import logging log = logging.getLogger('')
codeA.py:
from logger import log log.info('whatever')
codeB.py:
from logger import log log.warn('some other thing')
I suggest we split up this PR into smaller PRs and merge the ones that are ready. I’m not 100% confident in the logging portion of this PR yet and think it can be done more cleanly/easily. It is also an enhancement and not a critical need.
@tomvothecoder Regarding your example for "achieve a single global logger", this was not my intent (although not a bad idea). I assumed different modules might want to configure logging differently (stream versus file, and propagate/not), and wanted a single and centralized call (found in _logger.py) to accomplish the work, based upon configuration choices. Also, I had not created the "_logger.py" module myself, and did not want to deviate too far from what I thought was its functionality.
The main changes were to reduce the two functions (one for "root logger" one not) into a single function, and to provide "only console, only logfile, or both" as options.
But I am certainly up for another approach. I did try to kill multiple birds with one stone - perhaps unwise.
(ASIDE: I found it strange to have "main" be reduced to a single class that encapsulates everything, reducing the entire "main()" function call to "app =
@tomvothecoder You asked: "Can you please paste the log output on main vs. with this PR?"
I can do this, but a trial for atmos, land or ocean may produce different output, sue to their differences in handling. I'll work this up and provide the output.
Regarding your example for "achieve a single global logger", this was not my intent (although not a bad idea). I assumed different modules might want to configure logging differently (stream versus file, and propagate/not), and wanted a single and centralized call (found in _logger.py) to accomplish the work, based upon configuration choices. Also, I had not created the "_logger.py" module myself, and did not want to deviate too far from what I thought was its functionality.
The main changes were to reduce the two functions (one for "root logger" one not) into a single function, and to provide "only console, only logfile, or both" as options.
I have a bit more time and will look into a solution to logging. I implemented the _logger.py
module to replace the print statements being used before (e.g., the colored ones Sterling implemented).
(ASIDE: I found it strange to have "main" be reduced to a single class that encapsulates everything, reducing the entire "main()" function call to "app = " and then "return app.run()". Is this a common pattern?)
Can you clarify how you find this design pattern strange? From my experience, this is a common API design pattern for software programs. The core program is a class (E3SMtoCMIP
in this case) that encapsulates related methods and attributes related to the e3sm_to_cmip
"run". This design allows us to store the CLI configurations as Python class attributes that can be referenced throughout the class, rather than needing to pass them around constantly between standalone functions (like how it was before), resulting in a much cleaner codebase. Also read this page for testing and debugging using this API design, which wasn't possible before.
I suggest we split up this PR into smaller PRs and merge the ones that are ready. I’m not 100% confident in the logging portion of this PR yet and think it can be done more cleanly/easily. It is also an enhancement and not a critical need.
To split up this PR, you can:
@tomvothecoder @chengzhuzhang Having given this consideration, I am inclined to abandon this branch, create two new branches off of main (obtained by a fresh git-clone),
branch: consolidate_cmor_setup_273
branch: redesign_logger_274
and then - reviewing each logged commit - re-introduce only the "cmor setup" or only the "logfile" work upon each branch. That way, if any commit happened to "overlap" (contain both sorts of changes), they can be easily kept separate.
We can then test these changes separately as well.
This seems the cleanest (and safest) approach, compared to branching off of a "muddy" branch and trying to selectively revert.
We can then test these changes separately as well.
This seems the cleanest (and safest) approach, compared to branching off of a "muddy" branch and trying to selectively revert.
Sounds good with me. Thank you Tony.
@tomvothecoder @chengzhuzhang I am closing this issue/PR as abandoned, since it has been split into issue #275 (Consolidate cmor setup, since merged to master) and issue #274 (Redesign logging, soon to be PR'd).
Description
This update to e2c addresses two areas of consolidation: setup_cmor functions (calls to cmor.setup) and e2c logging configurations. Other minor bug-fix and cleanup conducted. All changes tested against a sample from each variable type (Amon 2D, Amon 3D, day, 3hr, CFmon, Lmon, LImon, Omon, SImon).
CMOR SETUP CONSOLIDATION:
The motivation for cmor.setup reconfigurations is that this cmor function was called from 7 separate locations:
Aside from unnecessary replication of codes, each call to cmor.setup invokes a cmor-internal "logger" setup, whose only external accessibility is the name of the logfile to supply. There is no ability to alter its propagate(True/False), date-format or other attributes. (I believe it is responsible for the "duplicated log messages" problem, as only those messages with the CMOR date-format appear to be duplicated.)
Now, there are only TWO functions that invoke cmor.setup: util.setup_cmor(), and handler._setup_cmor_module(). The latter is applied to every handler managed through the "handlers.yaml" construct. The former handles all other variables. The function in mpas.py has been eliminated.
LOGGER CONSOLIDATION:
Likewise, the motivation in consolidating the logger functions is that python logging is "global" - different modules adding stream or file handlers to their loggers accumulate across all logging.
Now, there is a single "e2c_logger()" function in _logger.py, and it is called by ANY (code-accessible) module in e2c that requires logging. It can be configures to return the root logger, or a named-logger, to apply logging to a file or to console or both, and to propagate or not.
FIXING the problem where "--help" and "--version" cause a cmor_logs directory to be created in the user's current directory required some main-line refactoring. Previously, modules like mpas.py, handler.py, etc each would call their logger setup in the global space. That global space is activated whenever a module is "imported", irrespective of whether any functions defined within have been called. And ALL of these imports occur BEFORE we get a chance to parse the main command-line args (and discover that only --help or --version was being invoked.)
Now each of these modules has its own function to be called to set-up its logger. These are imported into "main" and invoked only AFTER arg-parsing is completed (and not at all for --help and --version calls).
CHANGES BY MODULE:
Aside from virtually ALL modules now using _logger:e2c_logger() for logging operations, the following changes have been made:
MINOR CHANGES:
Replaced some - but not all ".format()" string variable formatting with f"{va}" format style.
Checklist
If applicable: