Currently, if incorrect primer sequences are specified in the samplesheet, SPLIT_LOCI will yield empty files and the pipeline will end early without error. Because READ_TRACKING won't have run, there will be no obvious sign to an inexperienced user that primer sequences may have been wrong. This partially stems from READ_FILTER having optional outputs, which I would rather keep at this stage.
One way to handle this could be to check for the presence of primer sequences at the start and/or end of the reads, and if they are found in a proportion of reads below a specified threshold, the pipeline will fail with an informative error. Alternatively, if either SPLIT_LOCI or PRIMER_TRIM outputs too many empty read files, the pipeline can fail (perhaps a pipeline parameter can toggle this behaviour on/off).
In addition, the pipeline could check for common sequences at the start and ends of reads and display them to the user in the error message, which might help them realise what the true primers are and adjust the inputs accordingly.
Would have to think about scenarios where one primer pair is correct but another isn't, as well.
Currently, if incorrect primer sequences are specified in the samplesheet,
SPLIT_LOCI
will yield empty files and the pipeline will end early without error. BecauseREAD_TRACKING
won't have run, there will be no obvious sign to an inexperienced user that primer sequences may have been wrong. This partially stems fromREAD_FILTER
having optional outputs, which I would rather keep at this stage.One way to handle this could be to check for the presence of primer sequences at the start and/or end of the reads, and if they are found in a proportion of reads below a specified threshold, the pipeline will fail with an informative error. Alternatively, if either
SPLIT_LOCI
orPRIMER_TRIM
outputs too many empty read files, the pipeline can fail (perhaps a pipeline parameter can toggle this behaviour on/off).In addition, the pipeline could check for common sequences at the start and ends of reads and display them to the user in the error message, which might help them realise what the true primers are and adjust the inputs accordingly.
Would have to think about scenarios where one primer pair is correct but another isn't, as well.