Support analysis of Influenza B sequence data

BCCDC-PHL / FluViewer

Tool for generating influenza A virus genome sequences from FASTQ data

https://bccdc-phl.github.io/FluViewer/

4 stars 1 forks source link

Support analysis of Influenza B sequence data #23

Open dfornika opened 2 weeks ago

dfornika commented 2 weeks ago

The segment lengths as defined here:

https://github.com/BCCDC-PHL/FluViewer/blob/a71e39a0fd59c28644ee598ae4ceb986c0b913bb/fluviewer/fluviewer.py#L302-L311

...and here:

https://github.com/BCCDC-PHL/FluViewer/blob/a71e39a0fd59c28644ee598ae4ceb986c0b913bb/fluviewer/fluviewer.py#L1589-L1591

...are appropriate for Flu A sequences, but not for Flu B.

Adjust the segment lengths to be compatible with both FluA and FluB sequences.

dfornika commented 3 days ago

This could be handled in a few ways:

The user specifies which "mode" they want to run the analysis in up front using a command-line argument: "Flu A mode" or "Flu B mode".
Relax the constraints that are in place such that they accommodate both Flu A and Flu B sequences (but are still effective in catching inappropriate segment lengths that we shouldn't see for either Flu A or Flu B
Dynamically detect which type of Flu sample is being analyzed, and use the appropriate segment length ranges (and possibly other QC/analysis criteria).

We could also possibly combine aspects of multiple of these approaches. For example, we could take the dynamic approach by default, but allow the user to bypass that and "force" either Flu A or Flu B mode up-front. I think that may be the best approach.

stefkary commented 3 days ago

Possibility to provide a "sample sheet" that specifies which samples & controls should be handled as FluA and FluB?

dfornika commented 3 days ago

Thanks for that suggestion @stefkary. I think the place we'd like to handle that would be our nextflow wrapper for FluViewer, which is here: https://github.com/BCCDC-PHL/fluviewer-nf

FluViewer itself is focused on analysis of a single sample at a time. In the nextflow wrapper would handle running analysis on multiple samples. I've added an issue there: https://github.com/BCCDC-PHL/fluviewer-nf/issues/14

dfornika commented 2 days ago

Ok @stefkary we've added support for samplesheet input on our BCCDC-PHL/fluviewer-nf pipeline. We're currently only collecting info on the sample ID and the R1 and R2 illumina fastq files through the samplesheet. Once we've incorporated the ability to specify a "Flu A mode" and "Flu B mode" via command-line arguments here we'll consider how we can incorporate that into the samplesheet.

We'll also plan to support long (nanopore) read input via the samplesheet once that has been added here.