gagneurlab / drop

Pipeline to find aberrant events in RNA-Seq data, useful for diagnosis of rare disorders
MIT License
137 stars 45 forks source link

error in aberrantExpression module #516

Closed jfertaj closed 9 months ago

jfertaj commented 9 months ago

Hi I am trying to run the module aberrantExpression module but it fails with a CPU time error, here is the error message:

Wed Feb  7 07:47:18 2024: SizeFactor estimation ...
Wed Feb  7 07:47:18 2024: Controlling for confounders ...
Wed Feb  7 07:47:18 2024: Using the autoencoder implementation for controlling.
[1] "Wed Feb  7 07:47:22 2024: Initial PCA loss: 11.6348183713699"

Stop worker failed with the error: reached CPU time limit
Error: BiocParallel errors
  1 remote errors, element index: 1
  3 unevaluated and other errors
  first remote error:
Error: BiocParallel errors
  1 remote errors, element index: 11988
  7431 unevaluated and other errors
  first remote error:
Error in optim(pari, fn = truncLogLiklihoodD, gr = gradientD, k = ki, : L-BFGS-B needs finite values of 'fn'

Execution halted

this is my config file:

projectTitle: "DROP: Detection of RNA Outliers Pipeline"
root: /exports/igmm/eddie/tomlinson-CRC-promethion/jfertaj/Maria_Eugenia/RNA_SEQ_FIS/DROP/Output            # root directory of all output objects and tables
htmlOutputPath: /exports/igmm/eddie/tomlinson-CRC-promethion/jfertaj/Maria_Eugenia/RNA_SEQ_FIS/DROP/Output/html   # path for HTML rendered reports
indexWithFolderName: true # whether the root base name should be part of the index name

hpoFile: null  # if null, downloads it from webserver
sampleAnnotation: /exports/igmm/eddie/tomlinson-CRC-promethion/jfertaj/Maria_Eugenia/RNA_SEQ_FIS/sample_annotation.txt # path to sample annotation (see documentation on how to create it)

geneAnnotation:
    v39: /exports/igmm/eddie/tomlinson-eQTL/reference/gencode.v39.GRCh38.annotation.gtf
genomeAssembly: hg38
genome: # path to reference genome sequence in fasta format.
    ncbi: /exports/igmm/eddie/tomlinson-eQTL/reference/Homo_sapiens_assembly38.fasta # You can define multiple reference genomes in yaml format, ncbi: path/to/ncbi, ucsc: path/to/ucsc
    # the keywords that define the path should be in the GENOME column of the sample annotation table

exportCounts:
    # specify which gene annotations to include and which
    # groups to exclude when exporting counts
    geneAnnotations:
        - v39
    excludeGroups: null

aberrantExpression:
    run: true
    groups:
        - group1
    fpkmCutoff: 1
    implementation: autoencoder
    padjCutoff: 0.05
    zScoreCutoff: 0
    genesToTest: null
    maxTestedDimensionProportion: 3
    yieldSize: 2000000

aberrantSplicing:
    run: true
    groups:
        - group1
    recount: false
    longRead: false
    keepNonStandardChrs: false
    filter: true
    minExpressionInOneSample: 20
    quantileMinExpression: 10
    minDeltaPsi: 0.05
    implementation: PCA
    padjCutoff: 0.1
    maxTestedDimensionProportion: 6
    genesToTest: null
    ### FRASER1 configuration
    #FRASER_version: "FRASER"
    #deltaPsiCutoff : 0.3
    #quantileForFiltering: 0.95
    ### For FRASER2, use the follwing parameters instead of the 3 lines above:
    FRASER_version: "FRASER2"
    deltaPsiCutoff : 0.1
    quantileForFiltering: 0.75

mae:
    run: true
    groups:
        - group1
    gatkIgnoreHeaderCheck: true
    padjCutoff: 0.05
    allelicRatioCutoff: 0.8
    addAF: true
    maxAF: 0.001
    maxVarFreqCohort: 0.05
    # VCF-BAM matching
    qcVcf: /exports/igmm/eddie/tomlinson-eQTL/reference/qc_vcf_1000G_GRCh38.vcf.gz
    qcGroups:
        - group1
    dnaRnaMatchCutoff: 0.85

rnaVariantCalling:
    run: false
    groups:
        - group1
    highQualityVCFs:
        - /exports/igmm/eddie/tomlinson-eQTL/reference/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
        - /exports/igmm/eddie/tomlinson-eQTL/reference/1000G_phase1.snps.high_confidence.hg38.vcf.gz
    dbSNP: /exports/igmm/eddie/tomlinson-eQTL/reference/dbSNP/b155/hg38/dbSNP155.hg38_withCHR.vcf.gz
    repeat_mask: /exports/igmm/eddie/tomlinson-eQTL/reference/hg38_repeatMasker_sorted.bed
    createSingleVCF: true
    addAF: true
    maxAF: 0.001
    maxVarFreqCohort: 0.05
    hcArgs: ""
    minAlt: 3
    yieldSize: 100000

tools:
    gatkCmd: gatk
    bcftoolsCmd: bcftools
    samtoolsCmd: samtools

Is there any parameter I can add to increase the time limit?

Thanks

NOTE I have run the same command with just one core, the CPU limit error disappear but the BiocParallell errors remain

vyepez88 commented 9 months ago

Hi, usually this error L-BFGS-B needs finite values of 'fn' indicates that some counts are too low. Did you check the Counting_summary script? how do the total counts or size factors look like?

jfertaj commented 9 months ago

Yes, that was the problem. The size factors were too low (0.05-0.09) for four samples.

Thanks