cositools / cosipy

The COSI high-level data analysis tools
Apache License 2.0
3 stars 16 forks source link

Change print() to logging.info() #48 #178

Closed krishnatejavedula closed 1 month ago

krishnatejavedula commented 1 month ago
ckarwin commented 1 month ago

@krishnatejavedula I went over all the changes in this PR and everything looks good. I tested the code for the dataIO using new unit tests that I've prepared locally, and everything works fine. I also tested the dataIO tutorial notebook with no problems.

The only issue is that with the current (default) configuration, all of the logging info is effectively lost, i.e. it's not printed to terminal, nor saved to file. However, it is highly beneficial to have both. The terminal output is needed to update users on the flow of the analysis (when running from command line or in a notebook), and writing to a log file is useful for future reference.

Only the root logger needs to be configured, since all the lower level loggers at module level eventually forward their messages to its handlers. I think a good place for the root logger is in cosipy/config/configurator.py. So in this file I would suggest that you add the following lines:

import sys
logging.basicConfig(filename='cosipy.log', level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(sys.stdout))

This will print all logging info to stdout, and save it to a log file called cosipy.log (in the working directory). Note that by default, the logging level will be INFO. In the future we can set this as an input, e.g. to allow users to set it to DEBUG if needed, or a lower verbosity (ERROR) if they don't want any logging info for some reason.

I'll get @israelmcmc in the loop here to see what he thinks, b/c this ultimately relates to issue #26, and having a standard logging across all modules.

After making the above changes, I think this PR can be merged.

ckarwin commented 1 month ago

Note that DataIO.py already reads the input yaml file using Configurator(), and so it automatically uses its logger config.

ckarwin commented 1 month ago

Actually will need to think about how to setup the root logger somewhere.

israelmcmc commented 1 month ago

Thanks for reviewing @ckarwin!

TL;DR: I think this can be merged now.

It's best practice that the root logger is not configured by the library but by the user (discussed here). Cosipy can be used as part of an even larger codebase, and who knows what they want to do then. This flexibility and customizability by the final user is actually the main reason to use logging instead of print statements.

By default, only warnings are printed to stderr (discussed here). The rationale, I assume, is that good code works like a black box, and you only want to be bothered if something goes wrong and needs attention. Also, the default is stderr and not a file because that way the user can redirect all the logs to a single place without having to worry about what the internal pieces of code are doing do.

That said, I agree that using the INFO level is beneficial for most of our purposes, especially at this point in the development. We can simply add the following to the notebooks and our executable scripts (applications):

import logging
logging.basicConfig(level=logging.INFO)

Our applications should have a -v option that controls the verbose, which will be translated to e.g. 0 = warning only, 1 = info, 2 = debug.

I rather not output to a file by default, because then:

  1. You won't see the result in a jupyter notebook cell, which is awkward when working with interactive notebooks
  2. For an executable scripts, you lose the ability to redirect stderr to a file of your choice with 2> in the command line.

It would be convenient for our application to have an optional flag to save the logs to a file though.

This should all be part of the global config yaml file as well.

And yeah, I should write all of this down in a style guide so it can be discussed and followed.

In any case, I think this is all for future PRs, I think this one can be merged now.

ckarwin commented 1 month ago

Thanks for the feedback, @israelmcmc! That makes a lot of sense. The reference is really nice as well.

The rationale, I assume, is that good code works like a black box, and you only want to be bothered if something goes wrong and needs attention.

Yeah, indeed, I get the point. But in other cases, I would argue that it's good to know what the code is doing. For example, when the dataIO is reading the tra file it can take a long time, and without any indication of what's happening, users might think that something is wrong. Indeed, that happened before adding some of the print statements.

That said, I agree that using the INFO level is beneficial for most of our purposes, especially at this point in the development. We can simply add the following to the notebooks and our executable scripts (applications):

Sounds good.

I rather not output to a file by default, because then:

You won't see the result in a jupyter notebook cell, which is awkward when working with interactive notebooks For an executable scripts, you lose the ability to redirect stderr to a file of your choice with 2> in the command line.

For example, fermipy writes a log file by default (and prints to terminal). You can output to both a file and stderr (or stdout) at the same time with the lines I wrote above, and copied below:

logging.basicConfig(filename='cosipy.log', level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(sys.stdout))

Not sure about the executable though.

In any case, I think this is all for future PRs, I think this one can be merged now.

Yup, I agree. Thanks again!

israelmcmc commented 1 month ago

Yeah, maybe outputting to both a file and stderr is a good idea. I guess we care more about reproducibility than the regular software, and this might be a saver.

krishnatejavedula commented 1 month ago

Thank you @israelmcmc and @ckarwin for your feedback. This helps a lot. I am glad the PR is merged. I will be working on the rest of files mentioned in #48 and open a PR for them shortly so the issue could be closed.