NationalGenomicsInfrastructure / anglerfish

Anglerfish - Nanopore reads from Illumina libraries
MIT License
2 stars 4 forks source link

add more metadata in the anglerfish report #88

Open FranBonath opened 1 month ago

FranBonath commented 1 month ago

Currently it is somewhat difficult to connect the Anglerfish report to a flowcell, if all you have is the report itself. It would be nice to include at least the flowcell / run folder name, so we know from which run the data originated. Further, adding the pool name can help greatly in cases where we have different pools for the same project.

Finally, I would include the amount of "reads mapped to samples" in addition to reads mapping to adapters to the base statistic on top of the report.

Thank you for your consideration :)

remiolsen commented 1 month ago

Currently it is somewhat difficult to connect the Anglerfish report to a flowcell, if all you have is the report itself. It would be nice to include at least the flowcell / run folder name, so we know from which run the data originated. Further, adding the pool name can help greatly in cases where we have different pools for the same project.

I don't see any problem adding these these as optional metadata to anglerfish run, e.g. --flowcell and --pool and storing them in the report(s). I want anglerfish to be as portable as possible and not rely on any folder-structure to determine these things and envision it would be some upstream processes that have the task of filling these values, e.g. TACA at NGI.

Finally, I would include the amount of "reads mapped to samples" in addition to reads mapping to adapters to the base statistic on top of the report.

I think what you ask for is partly related to #64. If so, I think both number of reads matching to barcodes and number of reads not matching to barcodes should be reported.

kedhammar commented 3 weeks ago

@remiolsen Do we want to create new args for each metadata key or can we simply have an arg accepting any custom metadata key-value pairs? E.g. --metadata { 'flowcell': 'asdf', 'run_dir': '/asdf/asdf' }?

remiolsen commented 3 weeks ago

@kedhammar Yes, that would be a tidier solution :+1: