bahlolab / PLASTER

Nextflow pipeline for long amplicon typing of PacBio SMRT sequencing data
MIT License
2 stars 3 forks source link

processing taking too long #12

Closed cyriltata closed 2 years ago

cyriltata commented 2 years ago

After three hours of running the test pipeline, output hangs at

[fa/fd89fd] process > preproc:prep_ref:wget          [  0%] 0 of 1
[-        ] process > preproc:prep_ref:mmi           -
[05/542969] process > preproc:pb_ccs:ccs (2)         [100%] 2 of 2 ✔
[cf/4c4382] process > preproc:pb_ccs:merge           [100%] 1 of 1 ✔
[5c/405f23] process > preproc:extract_ccs_failed     [100%] 1 of 1 ✔
[94/e2f40c] process > preproc:pb_lima:lima (SR)      [100%] 2 of 2 ✔
[e2/80f4fd] process > preproc:pb_lima:merge_smry     [100%] 1 of 1 ✔
[-        ] process > preproc:pb_mm2                 -
[-        ] process > preproc:annotate_samples:AS    -
[-        ] process > preproc:annotate_amplicons     -
[-        ] process > preproc:pb_mm2_2               -
[-        ] process > preproc:split_sample_amplic... -
[-        ] process > preproc:index_bam              -
[-        ] process > preproc:alignment_stats        -
[-        ] process > preproc:pre_processing_report  -
cyriltata commented 2 years ago

here is why

Command error:
  --2022-05-11 13:07:33--  ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/chromosomes/chr22.fa.gz
    (try:11) => ‘chr22.fa.gz’
  Connecting to hgdownload.cse.ucsc.edu (hgdownload.cse.ucsc.edu)|128.114.119.163|:21... failed: Connection timed out.
  Retrying.
jemunro commented 2 years ago

Hi @cyriltata,

Does the machine you are running on have a connection to the internet? It seems like it is failing to download the reference genome for some reason. Otherwise it is possible that there was a temporary outage of the ucsc URL as well.

cyriltata commented 2 years ago

Hi @jemunro

Yes I am connected to the internet but using some hpc cluster so maybe there is some firewall there. However, I was able to download same file manually, placed it on our own servers then was able to proceed.

Now I get another error when executing

Error executing process > 'typing:get_pharmvar_vcf (CYP2D6:5.1.14)'

Caused by:
  Process `typing:get_pharmvar_vcf (CYP2D6:5.1.14)` terminated for an unknown reason -- Likely it has been terminated by the external system
Command error:
  Lines   total/split/realigned/skipped:        1/0/0/0
  Lines   total/split/realigned/skipped:        2/0/0/0
  Lines   total/split/realigned/skipped:        1/0/0/0
  Lines   total/split/realigned/skipped:        4/0/0/0
  Lines   total/split/realigned/skipped:        3/0/0/0
  Lines   total/split/realigned/skipped:        3/0/0/0
  Lines   total/split/realigned/skipped:        3/0/0/0
  Lines   total/split/realigned/skipped:        8/0/0/0
  Lines   total/split/realigned/skipped:        6/0/0/0
  Lines   total/split/realigned/skipped:        16/0/0/0
  Lines   total/split/realigned/skipped:        15/0/0/0
  Lines   total/split/realigned/skipped:        19/0/0/0
  Lines   total/split/realigned/skipped:        9/0/0/0
  Lines   total/split/realigned/skipped:        23/0/0/0
  Lines   total/split/realigned/skipped:        21/0/0/0
  Lines   total/split/realigned/skipped:        3/0/0/0
  Lines   total/split/realigned/skipped:        17/0/0/0
  Lines   total/split/realigned/skipped:        3/0/0/0
  Lines   total/split/realigned/skipped:        2/0/0/0
  Lines   total/split/realigned/skipped:        2/0/0/0
  Lines   total/split/realigned/skipped:        7/0/0/0
  Lines   total/split/realigned/skipped:        3/0/0/0
  Lines   total/split/realigned/skipped:        15/0/0/0
  Lines   total/split/realigned/skipped:        2/0/0/0
  Lines   total/split/realigned/skipped:        1/0/0/0
  Lines   total/split/realigned/skipped:        1/0/0/0
  Lines   total/split/realigned/skipped:        1/0/1/0
  Lines   total/split/realigned/skipped:        3/0/1/0
  Lines   total/split/realigned/skipped:        1/0/1/0
  Lines   total/split/realigned/skipped:        2/0/0/0
  Lines   total/split/realigned/skipped:        1/0/0/0
  Lines   total/split/realigned/skipped:        3/0/0/0
  Lines   total/split/realigned/skipped:        3/0/0/0
  Lines   total/split/realigned/skipped:        1/0
jemunro commented 2 years ago

Hi @cyriltata,

It seems like the process is either running out of time or memory.

I've just released a new version that increases the memory for this task and also adds the ability for jobs to re-execute with increased walltime and memory if they fail. Give it a go and see if it solves your problem.

cyriltata commented 2 years ago

@jemunro all almost ran smooth but for some MySQL connection failure but I am suspecting it may be a firewall issue at my end

Command error:
  Possible precedence issue with control flow operator at /opt/conda/envs/PLASTER_22.05.01/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
  DBI connect('host=ensembldb.ensembl.org;port=3306','anonymous',...) failed: Can't connect to MySQL server on 'ensembldb.ensembl.org' (110) at /opt/conda/envs/PLASTER_22.05.01/share/ensembl-vep-104.0-0/Bio/EnsEMBL/Registry.pm line 1771.

Execution state

[8d/0e5450] process > typing:prep_ref:wget           [100%] 1 of 1 ✔
[eb/0292d4] process > typing:prep_ref:fai_dict       [100%] 1 of 1 ✔
[99/bf386d] process > typing:get_pharmvar_vcf (5.... [100%] 1 of 1 ✔
[b0/b00347] process > typing:prep_bams:merge (NA1... [100%] 4 of 4 ✔
[7b/6a0864] process > typing:prep_bams:fusion_cal... [100%] 10 of 10 ✔
[a6/907e5f] process > typing:prep_bams:fusion_rep... [100%] 1 of 1 ✔
[c9/89e41c] process > typing:prep_bams:downsample... [100%] 2 of 2 ✔
[82/240087] process > typing:prep_bams:index (NA1... [100%] 19 of 19 ✔
[1b/9991fe] process > typing:gatk_1:haplotype_cal... [100%] 19 of 19 ✔
[d5/127fa0] process > typing:gatk_1:genotype_gvcf... [100%] 2 of 2 ✔
[b4/b6ee84] process > typing:phase:get_snp_pos (C... [100%] 2 of 2 ✔
[98/fa1ab7] process > typing:phase:assign_snps (N... [100%] 19 of 19 ✔
[ff/b33f33] process > typing:phase:amp_phaser (NA... [100%] 19 of 19 ✔
[12/1a3140] process > typing:phase:split_phases (... [100%] 16 of 16 ✔
[12/271b53] process > typing:gatk_2:haplotype_cal... [100%] 28 of 28 ✔
[78/2d8072] process > typing:gatk_2:genotype_gvcf... [100%] 2 of 2 ✔
[76/cb1677] process > typing:gatk_2:get_targ_site... [100%] 1 of 1 ✔
[f0/515288] process > typing:gatk_2:call_targ_sit... [100%] 19 of 19 ✔
[64/494ec9] process > typing:gatk_2:merge_targ_si... [100%] 1 of 1 ✔
[bd/841380] process > typing:vep (CYP2D7)            [ 75%] 3 of 4, failed: 3...
[-        ] process > typing:pharmvar_star_allele    -

Error executing process > 'typing:vep (CYP2D7)'
cyriltata commented 2 years ago

@jemunro just as a follow up, could we use vep over http and not using a database connection?

 vep --input_file $vcf \\
            --database \\
            --format vcf \\
            --vcf \\
            --everything \\
            --allele_number \\
            --variant_class \\
            --dont_skip \\
            --assembly $params.vep_assembly \\
            --cache_version $params.vep_cache_ver \\
            --allow_non_variant \\
            --pick_allele_gene \\
            --output_file STDOUT |
            bcftools view --no-version -Oz -o $out
        bcftools index -t $out
cyriltata commented 2 years ago

So had to set up my own local DB. It was a firewall issue