epi2me-labs / wf-artic

ARTIC SARS-CoV-2 workflow and reporting
https://labs.epi2me.io/
Other
45 stars 34 forks source link

Consensus sequences are not produced even after good coverage #97

Closed Rohit-Satyam closed 11 months ago

Rohit-Satyam commented 11 months ago

Operating System

Ubuntu 22.04

Other Linux

No response

Workflow Version

0.3.31

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

nextflow run epi2me-labs/wf-artic --fastq $PWD/fastq_pass/ --out_dir wf-articresults --scheme_version ARTIC/V4.1 --medaka_model r941_min_fast_variant_g507 --pangolin_version latest --update_data true --report_detailed true -profile singularity --min_len 150 -resume

Workflow Execution - CLI Execution Profile

None

What happened?

The command was executed successfully but all the samples seems to fail in the HTML report and empty variant files are produced.

Relevant log output

nextflow run epi2me-labs/wf-artic --fastq $PWD/fastq_pass/ --out_dir wf-articresults --scheme_version ARTIC/V4.1 --medaka_model r941_min_fast_variant_g507 --pangolin_version latest --update_data true --report_detailed true -profile singularity --min_len 150 -resume
N E X T F L O W  ~  version 23.04.3
NOTE: Your local project version looks outdated - a different revision is available in the remote repository [cb0ad4a440]
Launching `https://github.com/epi2me-labs/wf-artic` [sleepy_cantor] DSL2 - revision: a60a1e1e73 [master]

WARN: Found unexpected parameters:
* --scheme_dir: primer_schemes
- Ignore this warning: params.schema_ignore_params = "scheme_dir" 

Core Nextflow options
  revision        : master
  runName         : sleepy_cantor
  containerEngine : singularity
  launchDir       : /data/covid_sept2023
  workDir         : /data/covid_sept2023/work
  projectDir      : /home/subudhak/.nextflow/assets/epi2me-labs/wf-artic
  userName        : subudhak
  profile         : singularity
  configFiles     : /home/subudhak/.nextflow/assets/epi2me-labs/wf-artic/nextflow.config

Basic Input/Output Options
  out_dir         : wf-articresults
  fastq           : /data/covid_sept2023/fastq_pass/

Primer Scheme Selection
  scheme_version  : ARTIC/V4.1

Advanced options
  min_len         : 150
  update_data     : true
  pangolin_version: latest
  normalise       : 200
  medaka_model    : r941_min_fast_variant_g507

Reporting Options
  report_detailed : true

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use epi2me-labs/wf-artic for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

      ------------------------------------
      Available Primer Schemes:
      ------------------------------------

  Name      Version
  spike-seq ONT/V1  
  spike-seq ONT/V4.1    
  SARS-CoV-2    NEB-VarSkip/v2b 
  SARS-CoV-2    NEB-VarSkip/v2  
  SARS-CoV-2    NEB-VarSkip/v1a-long    
  SARS-CoV-2    NEB-VarSkip/v1a 
  SARS-CoV-2    Midnight-IDT/V1 
  SARS-CoV-2    ARTIC/V2    
  SARS-CoV-2    ARTIC/V1    
  SARS-CoV-2    ARTIC/V4    
  SARS-CoV-2    ARTIC/V4.1  
  SARS-CoV-2    ARTIC/V3    
  SARS-CoV-2    Midnight-ONT/V2 
  SARS-CoV-2    Midnight-ONT/V1 
  SARS-CoV-2    Midnight-ONT/V3 

      ------------------------------------

Checking fastq input.
Non barcoded directories detected.
executor >  local (297)
[c7/cbc70e] process > pipeline:getVersions     [100%] 1 of 1, cached: 1 ✔
[07/4ddab8] process > pipeline:getParams       [100%] 1 of 1, cached: 1 ✔
[c0/48faa5] process > pipeline:copySchemeDir   [100%] 1 of 1, cached: 1 ✔
[46/d596c4] process > pipeline:preArticQC (53) [100%] 56 of 56, cached: 56 ✔
[c8/16a633] process > pipeline:runArtic (31)   [100%] 56 of 56 ✔
[fc/64adc8] process > pipeline:combineDepth    [100%] 1 of 1 ✔
[d1/b5660f] process > pipeline:allConsensus    [100%] 1 of 1 ✔
[f0/b14a49] process > pipeline:allVariants     [100%] 1 of 1 ✔
[f9/26c14e] process > pipeline:prep_nextclade  [100%] 1 of 1, cached: 1 ✔
[7c/55d8fd] process > pipeline:nextclade       [100%] 1 of 1 ✔
[fa/d24735] process > pipeline:pangolin        [100%] 1 of 1 ✔
[5f/d5b70e] process > pipeline:telemetry       [100%] 1 of 1 ✔
[0f/02a4e9] process > pipeline:report          [100%] 1 of 1 ✔
[db/4ac8fc] process > output (231)             [100%] 234 of 234 ✔
WARN: Failed to render execution report -- see the log file for details
WARN: Failed to render execution timeline -- see the log file for details
Completed at: 25-Sep-2023 18:15:21
Duration    : 28m 40s
CPU hours   : 1.3 (1.1% cached)
Succeeded   : 297
Cached      : 60

wf-artic-report.zip

mattdmem commented 11 months ago

Hello @Rohit-Satyam

This is certainly odd. What immediately worries me is the name of the samples:

024_barcode31, 1012643340-D_barcode06, 1016083501-D_barcode02, 1018515054-D_barcode05, 1054883101-D_barcode09....etc

This might be causing some unexpected behaviour, but I can't be sure. You could perhaps try running one or two samples with just the fastq folder called barcode01 and barcode02 etc.

The other thing that catches my eye is a shorter read length distribution:

Screenshot 2023-09-25 at 18 42 40

...but as you say coverage looks alright.

Try the the above and let me know how you get on.

Matt

Rohit-Satyam commented 11 months ago

Hi @mattdmem Thanks for your quick reply. I changed the directories names and kept just the barcodes. I also realized that old parameter --medaka_model has been replaced by --medaka_variant_model. That should throw warning or error but it doesn't.

Anyways, still all the barcodes fails. Below is the output of the artic pipeline for two samples. Coming to the read lengths, yes the average read length here is less than 350bp that we had in our previously sequenced samples. But any read greater than 75bp read length should be able to align accurately isn't it? We therefore keep the upper limit to 700bp to avoid chimeric reads but use --min_len 100 or 150 . Having said that, I was wondering if you can use Yacrd to remove chimeric reads instead of using a stringent cutoff of 700bp? I came across this tool recently while analyzing another nanopore data so thought it might be a good addition to this pipeline. Just thinking!!

nextflow run epi2me-labs/wf-artic --fastq $PWD/fastq_pass_renamed/ --out_dir wf-articresults_renamed --scheme_version ARTIC/V4.1 --medaka_variant_model r941_min_fast_variant_g507 --pangolin_version latest --update_data true --report_detailed true -profile singularity --min_len 100

wf-artic-report.zip

mattdmem commented 11 months ago

Curious!

Can you send me a bam file from the output? You can attach here if it isn’t too big. Please make sure there are no human reads along for the ride.

Thanks

Matt

CONFIDENTIAL

CONFIDENTIAL


From: Rohit Satyam @.> Sent: Tuesday, September 26, 2023 8:09:25 AM To: epi2me-labs/wf-artic @.> Cc: Matt Parker @.>; Mention @.> Subject: Re: [epi2me-labs/wf-artic] Consensus sequences are not produced even after good coverage (Issue #97)

Hi @mattdmemhttps://github.com/mattdmem Thanks for your quick reply. I changed the directories names and kept just the barcodes. I also realized that old parameter --medaka_model has been replaced by --medaka_variant_model. That should throw warning or error but it doesn't.

Anyways, still all the barcodes fails. Below is the output of the artic pipeline for two samples. Coming to the read lengths, yes the average read length here is less than 350bp that we had in our previously sequenced samples. But any read greater than 75bp read length should be able to align accurately isn't it? We therefore keep the upper limit to 700bp to avoid chimeric reads but use --min_len 100 or 150 . Having said that, I was wondering if you can use Yacrdhttps://github.com/natir/yacrd to remove chimeric reads instead of using a stringent cutoff of 700bp? I came across this tool recently while analyzing another nanopore data so thought it might be a good addition to this pipeline. Just thinking!!

wf-artic-report.ziphttps://github.com/epi2me-labs/wf-artic/files/12723864/wf-artic-report.zip

— Reply to this email directly, view it on GitHubhttps://github.com/epi2me-labs/wf-artic/issues/97#issuecomment-1734954489, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD3IA7TTXMOIZAAZM3MMHSTX4J5SLANCNFSM6AAAAAA5GOLGJU. You are receiving this because you were mentioned.Message ID: @.***>

IMPORTANT NOTICE: The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, re-transmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. Although we routinely screen for viruses, addressees should check this e-mail and any attachment for viruses. We make no warranty as to absence of viruses in this e-mail or any attachments.

Rohit-Satyam commented 11 months ago

Hi @mattdmem. I am embarrassed I think I found my own mistake. This time I cloned the repository and ran the latest version as follows and it worked.

nextflow run main.nf --fastq /data/covid_sept2023/fastq_pass --out_dir wf-articresults --scheme_version \
ARTIC/V4.1 --medaka_variant_model r941_min_fast_variant_g507 \
--pangolin_version latest --update_data true  -profile singularity --min_len 100 --update_data false -resume

I shouldn't have ignored the following warning:

NOTE: Your local project version looks outdated - a different revision is available in the remote repository [cb0ad4a440]
Launching `https://github.com/epi2me-labs/wf-artic` [special_kalman] DSL2 - revision: a60a1e1e73 [master]

I was under the impression that each time I specify nextflow info epi2me-labs/wf-artic, the nextflow will fetch the latest version of wf-artic pipeline from the github, but when I check it, that's not the case

nextflow info epi2me-labs/wf-artic
 project name: epi2me-labs/wf-artic
 repository  : https://github.com/epi2me-labs/wf-artic
 local path  : /home/subudhak/.nextflow/assets/epi2me-labs/wf-artic
 main script : main.nf
 description : Workflow for SARS-CoV-2 Network ARTIC analysis.
 author      : Oxford Nanopore Technologies
 revisions   : 
 * master (default)
   mindepth
   prerelease
   schema
   v0.0.2 [t]
   v0.0.3 [t]
   v0.0.4 [t]
   v0.0.5 [t]
   v0.0.6 [t]
   v0.0.7 [t]
   v0.1.0 [t]
   v0.1.1 [t]
   v0.1.2 [t]
   v0.1.3 [t]
   v0.1.4 [t]
   v0.2.0 [t]
   v0.2.1 [t]
   v0.2.2 [t]
   v0.2.3 [t]
   v0.3.0 [t]
   v0.3.10 [t]
   v0.3.11 [t]
   v0.3.12 [t]
   v0.3.13 [t]
   v0.3.14 [t]
   v0.3.15 [t]
   v0.3.16 [t]
   v0.3.18 [t]
   v0.3.2 [t]
   v0.3.3 [t]
   v0.3.4 [t]
   v0.3.5 [t]
   v0.3.6 [t]
   v0.3.7 [t]
   v0.3.8 [t]
   v0.3.9 [t]

When I check the nextflow.config file at the location /home/subudhak/.nextflow/assets/epi2me-labs/wf-artic, I realised I was using old version v0.3.18 (Jul 14, 2022). Can you tell me how to ensure that the latest version of wf-artic pipeline is used without having to clone repo each time? Is there a nextflow argument to do that?

mattdmem commented 11 months ago

No worries - I am glad you have solved the problem!

nextflow pull should update the repository on the command line - in the EPI2ME desktop app you can check for updates using the button