fls-bioinformatics-core / auto_process_ngs

Scripts and utilities for automatic processing & management of Illumina NGS sequencing data.
Other
9 stars 6 forks source link

Explicitly store number and names of multiplexed samples and use in reporting #955

Closed pjbriggs closed 2 months ago

pjbriggs commented 3 months ago

Background: for libraries which contain multiplexed samples (for example 10x Genomics CellPlex and Flex data), the reporting currently returns the number and names of the multiplexed samples (derived from the cellranger multi config file) instead of the number and names of the "physical" samples (which correspond to the Fastq files).

In the current implementation the information about multiplexed samples is not explicitly stored anywhere and is instead reported on the fly. However this is a problem for other types of data (specifically at this time Parse Evercode data, but potentially others in future), where the reporting only returns the physical sample information.

As it is not possible at this stage to derive the information for Parse data, an initial proposal to address the problem is:

For null values the reporting could fall back to the existing mechanisms (though possibly for 10x Genomics CellPlex and Flex data the metadata should also be explicitly set for completeness e.g. at the time of archiving).