Open majdabdul opened 1 year ago
Hi, I encountered the same issue. Have you managed to solve it?
Yes, but it was a while ago. I think I had realised that I aligned to the genome (Step 2 in my original post), when Salmon documentation specifically says you should align to the transcriptome. So I redid the alignment and it worked, if I'm remembering correctly.
On Thu, 18 Jul 2024, 16:41 YIGUIz, @.***> wrote:
Hi, I encountered the same issue. Have you managed to solve it?
— Reply to this email directly, view it on GitHub https://github.com/COMBINE-lab/salmon/issues/863#issuecomment-2236553993, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOFX65IGY6W3PKGRNNK7WOLZM7AY5AVCNFSM6AAAAABLCWZAB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWGU2TGOJZGM . You are receiving this because you authored the thread.Message ID: @.***>
Hello, I hope you're well!
Context
gffread -w merged_transcriptome.fa -g chm13v2.0.fa merged_transcripts.gtf
Now I'm trying to quantify with Salmon.
Firstly, I'm not sure whether it's better to use the short or long reads here as input to Salmon, given that my goal is to identify short peptides (I have peptidomics data) derived from specific splicing events. You may or may not be able to help me with that, but if you have thoughts, I'd appreciate them! Anyway, I decided arbitrarily to use the long-read BAMs as input to Salmon.
Bug description
Secondly, as discussed a little in #104 , I keep running into:
Transcript NM_032515.5 appears in the reference but did not appear in the BAM
andTranscript chr19 appeared in the BAM header, but was not in the provided FASTA file
(note here that it's an entire chromosome??? And these are the only "transcripts" that don't appear in the fasta- they're all just the chromosome names.) This happened regardless of whether I used the Stringtie fasta or the SQNATI-annotated fasta.This is the salmon command I had run:
$salmon quant --ont -t $transcriptome -l SF -a $bam -o $outdir/$name
As suggested, I used gffread to generate a new transcriptome fasta as follows:
gffread -w salmon_fix.fa -g chm13v2.0.fa chm13v2.0_RefSeq_Liftoff_v5.1.gff3
I reran the above salmon command using this new fasta file, but got the same error and warnings: there are transcripts in the BAM not in the fasta and vice versa. Again, the "transcripts" not in the fasta were chromosome names.
I also tried with the short read SAM files, and still got the same error.
I'm not sure how to fix either the warnings or the errors and would really appreciate your help.
Software information
I used salmon v1.10.0 (though I also tried v1.10.1). In the case of v1.10.0, I had downloaded the pre-compiled binary, and in the case of v1.10.1, the admins of the HPC cluster I used installed it- not sure how. Regardless, all the runs were on HPC clusters, which run on Linux CentOS (I use two different HPC clusters, depending on their availability).
Cluster1:
Cluster2:
Please let me know if I need to provide any more information. Thank you so much!