bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
992 stars 354 forks source link

miRNA counts are empty #3368

Closed naumenko-sa closed 3 years ago

naumenko-sa commented 3 years ago

Hello everyone and especially @lpantano !

One of the bcbio users on O2 hit and an error with bcbio template:

upload:
  dir: ../final
details:
  - analysis: smallRNA-seq
    algorithm:
      aligner: star # any other aligner is supported.
      # change adapter according project
      adapters: ["TGGAATTCTCGGGTGC"] 
      expression_caller: [trna, seqcluster, mirdeep2]
      # expression_caller: [trna, seqcluster, mirdeep2, mirge] Read docs to know how to use
      # miRge tools: https://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html#smallrna-seq
      species: mmu
    genome_build: mm10
#resources:
#  atropos: 
#    options: ["-u 4", "-u -4"]
#  mirge: 
#    options: ["-lib $PATH_TO_LIBS_FOLDER"]

The run was successul, but there is no miRNA counts. Files final/sample/mirbase-ready.counts are empty.

log files have 'mirdeep2 failed' message.

Let us know if you have any thoughts! Sergey

lpantano commented 3 years ago

Mmm, Have you checked if the trimmed files have a good amount of reads? Normally that is the first place to look. Let me know if the trimmed files look good.

naumenko-sa commented 3 years ago

Thanks, @lpantano !

Yes, that is too few reads, right:

$ cat project/work/trimmed/*/log/run.log | grep Writing | awk -F '/home' '{print $1}'
INFO-seqbuster(102): Writing 2369 sequences to 
INFO-seqbuster(102): Writing 2743 sequences to 
INFO-seqbuster(102): Writing 1718 sequences to 
INFO-seqbuster(102): Writing 2677 sequences to 
INFO-seqbuster(102): Writing 2418 sequences to 
INFO-seqbuster(102): Writing 1994 sequences to 
INFO-seqbuster(102): Writing 1984 sequences to 
INFO-seqbuster(102): Writing 1989 sequences to 
INFO-seqbuster(102): Writing 2561 sequences to 
INFO-seqbuster(102): Writing 3454 sequences to 
INFO-seqbuster(102): Writing 2345 sequences to 
INFO-seqbuster(102): Writing 2878 sequences to 
INFO-seqbuster(102): Writing 1595 sequences to 
INFO-seqbuster(102): Writing 2200 sequences to 
INFO-seqbuster(102): Writing 2050 sequences to

The raw input is in the ~ 30 mlns of reads per sample.

So something is wrong with the adapter sequence, probably?

Sergey

lpantano commented 3 years ago

Normally that is the issue, or the small RNA data is not that small RNA. Do you have more information about the kit used, protocol?

lramsdell commented 3 years ago

@lpantano The people who did the study used miRvana miRNA isolation kit, however they were not intending to target miRNA transcripts. Is it possible that the protocol used to create the libraries for sequencing or the read lengths makes it impossible to detect miRs? If not - is there a parameter we can change to correctly indicate the adapter sequence?

lpantano commented 3 years ago

If there are few reads with that adapter then probably they didn’t pick up small rna, at least not the ones like miRNAs. Not a lot to do.

You can try to predict the adapter in case they match anything that is known. dnapy is an option.

sorry to not help morr.