COMBINE-lab / salmon

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
https://combine-lab.github.io/salmon
GNU General Public License v3.0
777 stars 165 forks source link

Error: no valid ID found for GFF record #937

Open duanshumeng opened 5 months ago

duanshumeng commented 5 months ago

The bug primarily related to bulk mode

Describe the bug Firstly, I built the index with RefSeq reference genome,cds and annotation file. And then, I run the command to get gene quantification,then ,I got the error:

[2024-06-09 20:52:47.925] [jointLog] [info] iteration = 3200 | max rel diff. = 0.0612643 [2024-06-09 20:52:48.138] [jointLog] [info] iteration = 3300 | max rel diff. = 0.0500664 [2024-06-09 20:52:48.352] [jointLog] [info] iteration = 3400 | max rel diff. = 0.0411332 [2024-06-09 20:52:48.569] [jointLog] [info] iteration = 3500 | max rel diff. = 0.0320723 [2024-06-09 20:52:48.782] [jointLog] [info] iteration = 3600 | max rel diff. = 0.0170861 [2024-06-09 20:52:48.997] [jointLog] [info] iteration = 3700 | max rel diff. = 0.218755 [2024-06-09 20:52:49.210] [jointLog] [info] iteration = 3800 | max rel diff. = 0.0130318 [2024-06-09 20:52:49.424] [jointLog] [info] iteration = 3900 | max rel diff. = 0.0255888 Error: no valid ID found for GFF record [2024-06-09 20:52:49.648] [jointLog] [info] iteration = 4000 | max rel diff. = 0.088626 [2024-06-09 20:52:49.765] [jointLog] [info] iteration = 4050 | max rel diff. = 0.00732861

To Reproduce Steps and data to reproduce the behavior:

  1. The command to build index: threads=24 genome=/cpfs01/projects-HDD/cfff-e44ef5cf7aa5_HDD/dsm_23110700129/Reference/RNAseq_reference/ref/RefSeq_ref/GCF_000001405.40_GRCh38.p14_genomic.fna transcriptome=/cpfs01/projects-HDD/cfff-e44ef5cf7aa5_HDD/dsm_23110700129/Reference/RNAseq_reference/ref/RefSeq_ref/GCF_000001405.40_GRCh38.p14_cds_from_genomic.rename.fna index=/cpfs01/projects-HDD/cfff-e44ef5cf7aa5_HDD/dsm_23110700129/Reference/RNAseq_reference/index/salmon/RefSeqindex grep "^>" ${genome} | cut -d " " -f 1 > ${index}/decoys.txt sed -i.bak -e 's/>//g' ${index}/decoys.txt cut -d "" -f 1 ${transcriptome} > ${index}/salmon.cdna.fa cat ${index}/salmon.cdna.fa ${genome} > ${index}/gentrome.fa.gz salmon index -t ${index}/gentrome.fa.gz -d ${index}/decoys.txt -i ${index} -p $threads

  2. The command to get gene quantification: salmon quant -p 30 -i /cpfs01/projects-HDD/cfff-e44ef5cf7aa5_HDD/dsm_23110700129/Reference/RNAseq_reference/index/salmon/RefSeq_index -l IU -1 /cpfs01/projects-HDD/cfff-e44ef5cf7aa5_HDD/dsm_23110700129/Quartet_RNAseq/Salmon/tmp/trimmed/D5_1_R1_trimmed.fq.gz -2 /cpfs01/projects-HDD/cfff-e44ef5cf7aa5_HDD/dsm_23110700129/Quartet_RNAseq/Salmon/tmp/trimmed/D5_1_R2_trimmed.fq.gz --validateMappings --gcBias --seqBias -g /cpfs01/projects-HDD/cfff-e44ef5cf7aa5_HDD/dsm_23110700129/Reference/RNAseq_reference/ref/RefSeq_ref/GCF_000001405.40_GRCh38.p14_genomic.gtf -o /cpfs01/projects-HDD/cfff-e44ef5cf7aa5_HDD/dsm_23110700129/Quartet_RNAseq/Salmon/results/salmon/D5_1

Specifically, please provide at least the following information: salmon = 1.10.1 Installed through bioconda genome=/cpfs01/projects-HDD/cfff-e44ef5cf7aa5_HDD/dsm_23110700129/Reference/RNAseq_reference/ref/RefSeq_ref/GCF_000001405.40_GRCh38.p14_genomic.fna transcriptome=/cpfs01/projects-HDD/cfff-e44ef5cf7aa5_HDD/dsm_23110700129/Reference/RNAseq_reference/ref/RefSeq_ref/GCF_000001405.40_GRCh38.p14_cds_from_genomic.rename.fna