bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

Disabling adapter trimming in smallrnaseq pipeline #2247

Closed mshadbolt closed 6 years ago

mshadbolt commented 6 years ago

Hi guys I'd like to run the srnaseq pipeline on reads that have already been previously trimmed. I tried to disable trimming by specifying trim_reads: false in my .yaml config file but the pipeline still enters adapter trimming. Furthermore I hit an IndexError when DNApi-1.1 is running.

I am using the latest dev version of the pipeline

Is there a way to disable the adapter trimming step in the srnaseq pipeline so I can avoid this error? Error output & yaml file below:

bcbio_nextgen.py \
> $LOCAL_RESOURCES \
> ../config/srnaseq_test.yaml \
> --numcores 1
[2018-02-01T18:18Z] System YAML configuration: /projects/karsanlab/mshadbolt/KARSANBIO-28-AML_PMP_Integration/KARSANBIO-1331_bcbio_snrnaseq_pmp/lib/bcbio_system.clingen01.yaml
[2018-02-01T18:18Z] Resource requests: atropos, picard; memory: 3.60, 3.60; cores: 32, 32
[2018-02-01T18:18Z] Configuring 1 jobs to run, using 1 cores each with 3.60g of memory reserved for each job
[2018-02-01T18:18Z] Timing: organize samples
[2018-02-01T18:18Z] multiprocessing: organize_samples
[2018-02-01T18:18Z] Using input YAML configuration: /projects/karsanscratch/mshadbolt/bcbio_srnaseq_test/srnaseq_test/config/srnaseq_test.yaml
[2018-02-01T18:18Z] Checking sample YAML configuration: /projects/karsanscratch/mshadbolt/bcbio_srnaseq_test/srnaseq_test/config/srnaseq_test.yaml
[2018-02-01T18:18Z] Testing minimum versions of installed programs
[2018-02-01T18:18Z] multiprocessing: prepare_sample
[2018-02-01T18:18Z] Preparing sample1
[2018-02-01T18:18Z] Preparing sample2
[2018-02-01T18:18Z] Preparing sample3
[2018-02-01T18:18Z] Timing: adapter trimming
[2018-02-01T18:18Z] multiprocessing: trim_srna_sample
Traceback (most recent call last):
  File "/projects/rdocking_prj/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 241, in <module>
    main(**kwargs)
  File "/projects/rdocking_prj/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 46, in main
    run_main(**kwargs)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 43, in run_main
    fc_dir, run_info_yaml)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 87, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 312, in smallrnaseqpipeline
    samples = rnaseq_prep_samples(config, run_info_yaml, parallel, dirs, samples)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 400, in rnaseq_prep_samples
    samples = run_parallel("trim_srna_sample", samples)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1)(joblib.delayed(fn)(x) for x in items):
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 779, in __call__
    while self.dispatch_one_batch(iterator):
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 625, in dispatch_one_batch
    self._dispatch(tasks)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 111, in apply_async
    result = ImmediateResult(func)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/_parallel_backends.py", line 332, in __init__
    self.results = batch()
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 52, in wrapper
    return apply(f, *args, **kwargs)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 102, in trim_srna_sample
    return srna.trim_srna_sample(*args)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/srna/sample.py", line 56, in trim_srna_sample
    adapters = adapter if adapter else _dnapi_prediction(in_file, out_dir)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/bcbio/srna/sample.py", line 151, in _dnapi_prediction
    iterative_result = iterative_adapter_prediction(end_file, [1.2, 1.3, 1.4, 1.7, 2], [7, 11], 500000)
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/DNApi-1.1-py2.7.egg/dnapilib/apred.py", line 45, in iterative_adapter_prediction
  File "/projects/rdocking_prj/software/bcbio-nextgen/data/anaconda/lib/python2.7/site-packages/DNApi-1.1-py2.7.egg/dnapilib/kmer.py", line 38, in filter_kmers
IndexError: list index out of range

yaml:

details:
  - analysis: smallRNA-seq
    algorithm:
      trim_reads: false
      aligner: star
      # change adapter according project
      # adapters: ["TGGAATTCTCGGGTGC"] 
      expression_caller: [seqbuster, trna, seqcluster]
      species: hsa
    genome_build: hg19
upload:
  dir: ../final
mshadbolt commented 6 years ago

I found that by specifying the adapter (uncommenting the adapters: line the pipeline doesn't initiate de novo adapter discovery with DNApi and then successfully skips the trimming step.

This is enough of a fix for me but I will leave the issue open in case the developers see this as a bug as I'd assume specifying trim_reads as false should skip adapter trimming regardless of whether adapters are specified or not.

lpantano commented 6 years ago

Hi Marion,

Thanks so much for reporting this. I pushed a fix right now.

Cheers