Informative error when samplesheet contains incorrect primer sequences

Currently, if incorrect primer sequences are specified in the samplesheet, SPLIT_LOCI will yield empty files and the pipeline will end early without error. Because READ_TRACKING won't have run, there will be no obvious sign to an inexperienced user that primer sequences may have been wrong. This partially stems from READ_FILTER having optional outputs, which I would rather keep at this stage.

One way to handle this could be to check for the presence of primer sequences at the start and/or end of the reads, and if they are found in a proportion of reads below a specified threshold, the pipeline will fail with an informative error. Alternatively, if either SPLIT_LOCI or PRIMER_TRIM outputs too many empty read files, the pipeline can fail (perhaps a pipeline parameter can toggle this behaviour on/off).

In addition, the pipeline could check for common sequences at the start and ends of reads and display them to the user in the error message, which might help them realise what the true primers are and adjust the inputs accordingly.

Would have to think about scenarios where one primer pair is correct but another isn't, as well.

AVR-biosecurity-bioinformatics / freyr

Informative error when samplesheet contains incorrect primer sequences #19