bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
992 stars 354 forks source link

Methylation pipeline #3700

Open Adrian-Zet opened 1 year ago

Adrian-Zet commented 1 year ago

Version info

To Reproduce Exact bcbio command you have used:

bcbio_nextgen.py ../config/CTRL-MDD-S-MDD.yaml -n 72

Your yaml configuration file:

A few observations here:
The YAML file is really long since this is trying to analyze a study with 182 samples.
I thus excluded most rows repeating the same information for all samples. (separated with "..................")

details:
- algorithm:
    aligner: bismark
  analysis: wgbs-seq
  description: HealthyControl_M_087
  files:
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190430_1.fastq.gz
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190430_2.fastq.gz
  genome_build: hg38
  metadata:
    group: CTRL
    sex: male
- algorithm:
    aligner: bismark
  analysis: wgbs-seq
  description: HealthyControl_M_086
  files:
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190431_1.fastq.gz
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190431_2.fastq.gz
  genome_build: hg38
  metadata:
    group: CTRL
    sex: male
- algorithm:
    aligner: bismark
  analysis: wgbs-seq
  description: HealthyControl_M_089
  files:
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190432_1.fastq.gz
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190432_2.fastq.gz
  genome_build: hg38
  metadata:
    group: CTRL
    sex: male
.............................................................................
(excluded most samples from this point to the end for simplicity)
..............................................................................

- algorithm:
    aligner: bismark
  analysis: wgbs-seq
  description: HealthyControl_M_082
  files:
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190791_1.fastq.gz
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190791_2.fastq.gz
  genome_build: hg38
  metadata:
    group: CTRL
    sex: female
- algorithm:
    aligner: bismark
  analysis: wgbs-seq
  description: HealthyControl_M_085
  files:
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190792_1.fastq.gz
  - /export/home/math/saia/adrz/MDD-MDDS-CTRL/pysradb_downloads/SRP200298/SRR/methylation/SRR9190792_2.fastq.gz
  genome_build: hg38
  metadata:
    group: CTRL
    sex: female
fc_name: ctrl-vs-mdd-vs-mdds-mRNA-Methylation
resources:
  bismark:
    bismark_threads: 16
    bowtie_threads: 2
  trim_galore:
    options:
    - --clip_r1 4
    - --clip_r2 4
    - --three_prime_clip_r1 4
    - --three_prime_clip_r2 4
upload:
  dir: ../final

Log files (could be found in work/log) Please attach (10MB max):
The debug.log is huge (250Mb) due to the size of the workflow. If required I can either compress it or I can run a workflow with just one sample instead and attach that debug-log.

bcbio-nextgen.log bcbio-nextgen-commands.log

Expected behavior:

Resulting behavior:

naumenko-sa commented 1 year ago

Hi @Adrian-Zet ! The methylation pipeline does not support parallelization with ipython. Please run one bcbio project per sample, or per small group of samples. SN

naumenko-sa commented 1 year ago

bismark parallelization is tricky, see the table at the bottom here: https://bcbio-nextgen.readthedocs.io/en/latest/contents/methylation.html the running times might be differ from 2 hours to 3 days + depending on the settings.

I am not surprised what -n 72 is not working, I'd start with safer settings: -n 8, bismark/bowtie threads 4/2, 50G RAM for starters for one sample and go from there, maybe increase to -n16/ b/b: 8/2 100G RAM if that works for you.

SN