crim-ca / stac-populator

Workflow logic to populate STAC catalog with demo datasets.
MIT License
2 stars 2 forks source link

Adding feature to intercept errors and save information to files #45

Closed dchandan closed 9 months ago

dchandan commented 10 months ago

Adding feature to intercept errors encountered while (i) generating STAC items, (ii) posting STAC items to the server. Information about the dataset and the error it encountered are saved to separate log files. An example log file from a run of the stac-populator is included below.

stac-populator_CMIP6populator_errors_20240119-185007.txt

huard commented 10 months ago

I think this PR branches off the other one about the crawler. I'll wait for you to merge the first one before reviewing.

dchandan commented 10 months ago

I've merged it now.

dchandan commented 10 months ago

No, that's a good question. (i) I didn't know how to add a common file handler to the multiple loggers that are generated in the get_logger function, (ii) the log files can get very busy with other outputs and I thought it might be useful to have a separate clean files with error data that I can then work on fixing (I could always just ack or sed the log files, but for the moment this just felt more useful for the next reason), (iii) I am hoping that we can write some logic for the ErrorLoader that allows us to feed this separate error file (whose format can evolve to include more information) into a new run of stac-populator so that the populator only works on these error files rather than all files in a catalog (even if data is not posted for existing items, working on an entire whole catalog is still slow because all items have to be crawled through).

Thoughts, comments, suggestions?

huard commented 10 months ago

It's possible to configure the LOGGER object to write both to the terminal and to a file on disk. So if that meets the requirements, it could be an option.

In another project I've used this:

def init_log():
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.ERROR)

    fh = logging.FileHandler('log/run_MAGICC.log')
    fh.setLevel(logging.INFO)
    logger.addHandler(fh)
    return logger 

which logs ERRORs to the terminal, but stores all INFO messages to a file.

fmigneault commented 10 months ago

@dchandan A timely video that came up in my watch list today: https://www.youtube.com/watch?v=9L77QExPmI0 See JSON logs and extra parameter timestamps for some inspiration.

dchandan commented 9 months ago

Thanks for the comments and the useful video on the log file. I do like the JSON lines output format shown in the video as it would make parsing the log files for issues much easier. I've now revamped the whole app logging to use the setup in the video.

dchandan commented 9 months ago

Is it good to merge now?

dchandan commented 9 months ago

@fmigneault Ok, we can do that later. Kinda pressed for time rn, so we can circle back to this later. I'll make an issue of this, so it is documented.