Closed Codes1985 closed 8 months ago
Nice catch @Codes1985! I'll work on a fix.
This issue should be fixed in 3.3.7 with #66.
BLAST report and all derived results should show proper segment number and name for IBV:
Sample | Sample Genome Segment Number | Reference NCBI Accession | Reference Subtype | Genus |
---|---|---|---|---|
SRR25375797 | 1_PB1 | OQ998010.1 | Betainfluenzavirus | |
SRR25375797 | 2_PB2 | OR052894.1 | Betainfluenzavirus |
Thanks again @Codes1985 for catching and reporting the issue! Hopefully it didn't cause too much trouble with your submissions to NCBI! Please let me know if you have any other issues.
Thank you so much for fixing this so quickly, @peterk87! Yeah, not a big deal since we don't have too many FluB samples. I was basically last week years old when I discovered the segment numbering was different between FluA and FluB. I better turn in my Flu card! 😆 Thanks again and take care!
Is there an existing issue for this?
Description of the Bug/Issue
Hello!
We were in the process of preparing an upload of Influenza B sequences to GISAID, when we realized that nf-flu was incorrectly labelling PB1 as PB2 and PB2 as PB1.
As you know, the segment number is assigned based on segment length where segment 1 refers to the longest segment and 8 the shortest. For FluA, PB2 is the longest segment, and assigned as segment 1, while PB1 is the next longest and assigned as segment 2. Turns out for FluB, PB1 is the longest segment followed by PB2.
I noticed on line 32 in IRMA's init.sh script that this is accounted for:
SEG_NUMBERS="B_PB1:1,B_PB2:2,A_PB2:1,A_PB1:2,PA:3,HA:4,NP:5,NA:6,M:7,NS:8"
If my understanding of how nf-flu works is correct, the segment number is being appended by IRMA, while the segment ID is being applied by nf-flu based off IRMA's annotation in parse_influenza_blast_results.py:
Lines 29-38:
SEGMENT_NAMES = { 1: "1_PB2", 2: "2_PB1", 3: "3_PA", 4: "4_HA", 5: "5_NP", 6: "6_NA", 7: "7_M", 8: "8_NS", }
and lines 481-484:
And since IRMA has appended "1" to the FluB PB1 sequence and "2" to the FluB PB2 sequence, the PB1 sequences are being renamed to "1_PB2" and the FluB PB2 sequences to "2_PB1".
Thank you!
Nextflow command-line
Error Message
Workflow Version
3.3.6, revision: e2872b8
Nextflow Executor
slurm
Nextflow Version
22.10.0
Java Version
No response
Hardware
HPC Cluster
Operating System (OS)
Distributor ID: CentOS Description: CentOS Linux release 7.9.2009 (Core) Release: 7.9.2009 Codename: Core
Conda/Container Engine
Singularity
Additional context
No response