fls-bioinformatics-core / auto_process_ngs

Scripts and utilities for automatic processing & management of Illumina NGS sequencing data.
Other
9 stars 6 forks source link

Allow subsets of samples to be specified in 'transfer_data.py' #950

Closed pjbriggs closed 3 weeks ago

pjbriggs commented 4 months ago

It would be useful on occasion to be able to specify subsets of samples to be specified when running the transfer_data.py utility.

The existing --filter option allows some selection based on patterns but doesn't seem to be flexible enough to be generally applicable for this (generally I use it for selecting subsets of reads).

Something like a new --samples option could allow patterns of the form PB1-12, PB1,PB2,PB3, PB1,PB3-PB6 etc, with sample selection being applied before filtering.

pjbriggs commented 3 weeks ago

For an initial implementation the --samples option should just take a comma-separated list of sample names; the more advanced patterns could be problematic to implement generally (for example hyphens and numbers can also appear in sample names, and samples may not be sequentially numbered etc).