COMBINE-lab / salmon

šŸŸ šŸ£ šŸ± Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
https://combine-lab.github.io/salmon
GNU General Public License v3.0
777 stars 165 forks source link

@RG lines are at odds with binary encoded reference data Segmentation fault #323

Closed phickner closed 5 years ago

phickner commented 5 years ago

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?

I am using a version that was compiled in Trinity module load conda/trinity-2.8.4 source activate trinity-2.8.4 salmon --version salmon 0.11.3

Describe the bug

@RG lines are at odds with binary encoded reference data Segmentation fault

To Reproduce Steps and data to reproduce the behavior:

salmon quant -t Cp_cds.fa -l ISR -a SB1.bam -o salmon_SB1

Summary I mapped the reads to a de novo transcriptome assembly using BWA-mem. I then use Salmon to get counts for RNAseq analysis using the alignment based mode. This has worked well for the nine previous samples, but the last 3 samples throw the error above (different species and reference transcriptome assembly). Maybe a problem with the bam file? If no not sure how the read groups etc, could be a problem unless there is a problem with the fastq files. But they came from the same run/machine/lane (Illumina NextSeq 500) as six other samples that have not caused problems.

rob-p commented 5 years ago

Hi @phickner,

Do you have a sample data-set you could share that replicates the issue? Salmon doesn't really use the @rg flag for anything, so I'd want to debug this by doing a backtrack and seeing what function is causing the segfault.

phickner commented 5 years ago

Hi Rob, Thanks for getting back with me so quickly. I can send one of the bam files (5.4 Gb), the reference transcriptome (very small) and the code that I used. What is the best way to send you the bam file? As I mentioned, it is only happening with the last three samples. I thought there might be a problem with the indexed reference file, but I remapped them today and I am getting the same issue. Is there a way to send a partial bam? If so, please explain in detail. If you haven't figured it out yet, I am a biologist and not a computer scientist. Thanks, Paul

On Wed, Nov 28, 2018 at 1:22 PM Rob Patro notifications@github.com wrote:

Hi @phickner https://github.com/phickner,

Do you have a sample data-set you could share that replicates the issue? Salmon doesn't really use the @rg flag for anything, so I'd want to debug this by doing a backtrack and seeing what function is causing the segfault.

ā€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/COMBINE-lab/salmon/issues/323#issuecomment-442552027, or mute the thread https://github.com/notifications/unsubscribe-auth/APvI3nCz-bTioG6SXxss9lVPzrPIWIvAks5uztRQgaJpZM4Y4K_L .

rob-p commented 5 years ago

Hi Paul,

Can you try using this link to upload the files? Maybe just zip up the bam and transcriptome together and upload that via the link? Let me know if this works for you.

--Rob

phickner commented 5 years ago

Hi Rob, I attached the transcriptome to the email. I am uploading the bam file to the link you sent. It might take awhile. The attached transcriptome is small because we are interested in only the chemoreceptor genes.

Paul

On Wed, Nov 28, 2018 at 1:51 PM Rob Patro notifications@github.com wrote:

Hi Paul,

Can you try using this https://script.google.com/a/cs.stonybrook.edu/macros/s/AKfycbxXxZZzRrl0g6KFXNOqG9fworPhmFdGJeNvSsqNTA/exec link to upload the files? Maybe just zip up the bam and transcriptome together and upload that via the link? Let me know if this works for you.

--Rob

ā€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/COMBINE-lab/salmon/issues/323#issuecomment-442561563, or mute the thread https://github.com/notifications/unsubscribe-auth/APvI3jwFQ82RoRGrzSxX-NPxpAgONYN7ks5uztssgaJpZM4Y4K_L .

rob-p commented 5 years ago

Hi Paul,

Did you reply to the github issue via e-mail and attach it there? In that case, it won't show up. If you post a response via the github interface, you can just drag and drop the file into the text box to have it uploaded.

phickner commented 5 years ago

Here is the reference genome. I'm uploading the bam file now. Thanks, Paul

Cp_cds.txt

phickner commented 5 years ago

Attached is a screenshot, too. screenshot

rob-p commented 5 years ago

Hi @phickner,

The error message seems to be coming from the library we use to parse the BAM file (https://github.com/jkbonfield/io_lib/blob/master/io_lib/bam.c#L329). Is it possible that somehow the BAM itself is ill-formed? Maybe as determined by picard ValidateSamFile or some such?

rob-p commented 5 years ago

Hi @phickner,

Any update on this? How does the BAM file look under ValidateSamFile or some such?

phickner commented 5 years ago

My issue was resolved. Thanks.

On Sun, Dec 30, 2018 at 12:07 PM Rob Patro notifications@github.com wrote:

Hi @phickner https://github.com/phickner,

Any update on this? How does the BAM file look under ValidateSamFile or some such?

ā€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/COMBINE-lab/salmon/issues/323#issuecomment-450573944, or mute the thread https://github.com/notifications/unsubscribe-auth/APvI3l_civHZCPEisrvMD2azctC_EEM1ks5u-PLngaJpZM4Y4K_L .