Closed dchandan closed 9 months ago
I think this PR branches off the other one about the crawler. I'll wait for you to merge the first one before reviewing.
I've merged it now.
No, that's a good question. (i) I didn't know how to add a common file handler to the multiple loggers that are generated in the get_logger
function, (ii) the log files can get very busy with other outputs and I thought it might be useful to have a separate clean files with error data that I can then work on fixing (I could always just ack
or sed
the log files, but for the moment this just felt more useful for the next reason), (iii) I am hoping that we can write some logic for the ErrorLoader
that allows us to feed this separate error file (whose format can evolve to include more information) into a new run of stac-populator
so that the populator only works on these error files rather than all files in a catalog (even if data is not posted for existing items, working on an entire whole catalog is still slow because all items have to be crawled through).
Thoughts, comments, suggestions?
It's possible to configure the LOGGER object to write both to the terminal and to a file on disk. So if that meets the requirements, it could be an option.
In another project I've used this:
def init_log():
logger = logging.getLogger(__name__)
logger.setLevel(logging.ERROR)
fh = logging.FileHandler('log/run_MAGICC.log')
fh.setLevel(logging.INFO)
logger.addHandler(fh)
return logger
which logs ERRORs to the terminal, but stores all INFO messages to a file.
@dchandan A timely video that came up in my watch list today: https://www.youtube.com/watch?v=9L77QExPmI0 See JSON logs and extra parameter timestamps for some inspiration.
Thanks for the comments and the useful video on the log file. I do like the JSON lines output format shown in the video as it would make parsing the log files for issues much easier. I've now revamped the whole app logging to use the setup in the video.
Is it good to merge now?
@fmigneault Ok, we can do that later. Kinda pressed for time rn, so we can circle back to this later. I'll make an issue of this, so it is documented.
Adding feature to intercept errors encountered while (i) generating STAC items, (ii) posting STAC items to the server. Information about the dataset and the error it encountered are saved to separate log files. An example log file from a run of the stac-populator is included below.
stac-populator_CMIP6populator_errors_20240119-185007.txt