fls-bioinformatics-core / auto_process_ngs

Scripts and utilities for automatic processing & management of Illumina NGS sequencing data.
Other
9 stars 6 forks source link

Check for invalid probe barcodes in '10x_multi_config.csv' files #917

Open pjbriggs opened 7 months ago

pjbriggs commented 7 months ago

A common error when preparing the 10x_multi_config.csv files from the template generated by setup_analysis_dirs for 10x Genomics Cellplex and Flex data, is to forget to remove the example probe barcode and sample definition lines in the [samples] section.

The template line is:

MULTIPLEXED_SAMPLE,BC001|BC002|...,DESCRIPTION

and when this is left in then cellranger multi fails the pre-flight checks with the message:

[samples] row 2 has invalid probe_barcode_ids 'BC001|BC002|...' at line: 15, col: 20: must match [A-Za-z0-9_]+

As the cellranger multi jobs need significant resources it would be desirable to catch this error before attempting to queue the appropriate task in the QC pipeline.

pjbriggs commented 7 months ago

Additionally could also check for repeated probe IDs etc, for example (from cellranger multi output):

Re-used probe_barcode_ids ('WT1m') provided for sample 'PB2' (already provided for sample 'PB1')