DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
879 stars 237 forks source link

Better logs when accessing large files from S3 #4950

Closed adamnovak closed 3 weeks ago

adamnovak commented 1 month ago

This should fix #4867 by changing the logging and exception handling around bucket location finding.

We now announce finding the location at the same log level as failing to find it (DEBUG), and leave reporting at higher levels up to the caller. Most of the callers seem to have fallback approaches to use when a bucket location isn't available, and so don't have to log anything at a higher level.

Also, to keep the user from getting bored during file imports, I am logging them at INFO:

[2024-05-23T16:30:15-0400] [MainThread] [I] [toil.jobStores.abstractJobStore] Importing input s3://human-pangenomics/NHGRI_UCSC_panel/HG002/hpp_HG002_NA24385_son_v1/PacBio_HiFi/20kb/m64011_190901_095311.Q20.fastq...

Changelog Entry

To be copied to the draft changelog by merger:

Reviewer Checklist

Merger Checklist