DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Stitching log message is too large #3658

Open amarjandu opened 3 years ago

amarjandu commented 3 years ago
[INFO] 2021-11-20T00:05:49.033Z b66d0a71-ea5c-5def-a267-5063d7b99b18 Stitched 2244 bundle(s): {SourcedBundleFQID(uuid='d85a260a-fdc3-436e-8743-71f242cad989',
...

This is followed by 2244 instances of SourcedBundleFQID. This causes Cloudwatch to split the log line in multiple log entries. It's also impractically large.

nadove-ucsc commented 3 years ago

Since the stitched subgraphs are all from the same source we don't gain any information from logging the source attribute of the FQIDs. If we change processed and its related variables from Set[SourcedBundleFQID] to Set[BundleFQID], or upcast the elements of processed to BundleFQID before logging, it would reduce the number of characters logged for each stitched subgraph from approximately 343 (it varies slightly on the source spec) to 94, a 72% decrease.

dsotirho-ucsc commented 3 years ago

@hannes-ucsc to devise solution.

hannes-ucsc commented 2 years ago

Something like


log.info('Stitched on %i bundles' len(fqids))
if log.isEnabledFor(log.DEBUG):
    for fqid in fqids:
        log.debug('Stitched on bundle %s', BundleFQID.__repr__(fqid))