bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

RNA-Seq salmon error #3626

Closed keenhl closed 2 years ago

keenhl commented 2 years ago

I am running the RNA-Seq pipeline on bcbio 1.2.4. The following is a snippet of my config file.

- algorithm:
    aligner: hisat2
    quality_format: Standard
  analysis: RNA-seq
  description: a
  files:
  -  a_R1.fq.gz
  -  a_R2.fq.gz
  genome_build: hg38
fc_name: samples
resources:
  qualimap:
    memory: 6G
upload:
  dir: ../final

I get the following error. I should also mention, that I know this pipeline works as I successfully ran a batch of human samples through the pipeline just last week. Not sure the difference, except in this batch the files are very large.

salmon was only able to assign 0 fragments to transcripts in the index, but the minimum number of required assigned fragments (--minAssignedFrags) was 10. This could be indicative of a mismatch between the reference and sample, or a very bad sample. You can change the --minAssignedFrags parameter to force salmon to quantify with fewer assigned fragments (must have at least 1). ' returned non-zero exit status 1.

I have a few questions.

First, any thoughts about the error. These are human samples ran against human hg38.

Second, I don't even care about the salmon results. I just need the bam files created from the HiSat alignment, so I there a way for me to bypass this error and just get the bam files. Similarly, is there a way for me to run the RNA-Seq pipeline, get the HiSat2 bam files without all the other steps?

Third, what is the difference between the bam files currently in the work directory and the final bam files. In other words, can I just use these bam files. Probably not, but I thought I might ask.

Thank you very much for your help.

X9dab07e6-38b4-4fc8-8b53-21c47f1b8431_gdc_realn_rehead ├── X9dab07e6-38b4-4fc8-8b53-21c47f1b8431_gdc_realn_rehead-novelsplicesites.bed ├── X9dab07e6-38b4-4fc8-8b53-21c47f1b8431_gdc_realn_rehead-sort.bam ├── X9dab07e6-38b4-4fc8-8b53-21c47f1b8431_gdc_realn_rehead-sort.bam.bai ├── X9dab07e6-38b4-4fc8-8b53-21c47f1b8431_gdc_realn_rehead-sort.nsorted.bam ├── X9dab07e6-38b4-4fc8-8b53-21c47f1b8431_gdc_realn_rehead-sort.nsorted.primary.bam └── X9dab07e6-38b4-4fc8-8b53-21c47f1b8431_gdc_realn_rehead-splicejunctions.bed

keenhl commented 2 years ago

Based on checking files from a previous run, the sort.bam file in the work directory seems to be the same as the ready.bam file in the final directory.

naumenko-sa commented 2 years ago

Hi @keenhl !

1) Please note your bcbio is 1.2.4 while the current version is 1.2.9. Many bugs were fixed since 1.2.4. I would suggest you to upgrade. 2) Probably this sample needs more QC - what is % aligned, maybe kraken contaminations, fastqc report? 3) Unfortunately, turning salmon is not possible, getting salmon counts is one of the main goals of bcbio RNA-seq pipeline. 4) Yes, sort bam is copied as readby.bam in the final. 5) if your samples are just larger, you may try to increase RAM for the run as well.

SN

keenhl commented 2 years ago

Thanks for taking time to respond to my issue. I really appreciate it.

Regarding your suggestions

  1. Yes, you are right, upgrading is the right thing to do. I didn't want to upgrade yet for reproducibility (i.e., I ran other related samples using version 1.2.4 so I want to run these samples with the same version.

  2. Ok, I will look into this. Thanks.

  3. These two things are not mutually exclusive. Salmon counts is one goal of the pipeline, but it is still possible to have the option to turn off Salmon for users that only want the alignments.

  4. Thanks for confirming this.

  5. This is a good idea given that the samples are large. Thanks.