Medaka enabled pipeline exits with longshot failure

szufan commented 3 years ago

Hi Will/Nick!

I'm having an issue similar to this one.

I recently updated artic-ncov2019 from v.1.0.0 to v.1.1.3. Now when I run:

artic minion --medaka --normalise 200 --threads 72 --scheme-directory ~/artic-ncov2019/primer_schemes --read-file simulated_reads.tgz nCoV-2019/V3 test

The pipeline exits with the error:

Command failed:longshot -P 0 -F -A --no_haps --bam test.primertrimmed.rg.sorted.bam --ref ~/artic-ncov2019/primer_schemes/nCoV-2019/V3/nCoV-2019.reference.fasta --out test.longshot.vcf --potential_variants test.merged.vcf.gz

There also seems to be an upstream error:

WARNING: Potential variant VCF contains contig b'MN908947.3' not found in BAM contigs.
error: Error reading potential variants VCF file.
caused by: Error accessing tid from chrom2tid data structure

My env is loaded with longshot=0.4.1., which was the dependency for artic-ncov2019/environment-medaka.yml in v.1.0.0. However, I see several medaka-related dependencies in the new artic-ncov2019/environment.yml are no longer listed. I'm unsure how to troubleshoot this one further. I have attached the full output log and package list: packages.txt log.txt

PaolaArzuffi commented 3 years ago

Hello,

I see the same issue you're describing:

I am running the artic minion --medaka command on a Debian machine within the artic-ncov2019 enviroment.

run artic minion --medaka --normalise 200 --threads 8 \
    --scheme-directory ../primer_schemes \
    --read-file $read_file $scheme \
    $prefix >>$log_file 2>&1 \

I have noticed that this command fails because of a different format between the vcf and the bam files:

2021-02-05 14:29:54 Reading potential variants from input VCF...
WARNING: Potential variant VCF contains contig b'MN908947.3' not found in BAM contigs.
error: Error reading potential variants VCF file.
caused by: Error accessing tid from chrom2tid data structure

All the vcfs generated by medaka_varianta have the b'MN908947.3' formatting:

##fileformat=VCFv4.1
##medaka_version=1.0.3
##contig=<ID=MN908947.3>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Medaka genotype.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Medaka genotype quality score">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE
b'MN908947.3'   55  .   AGATC   CTTTAAAA    0.131   PASS        GT:GQ   1:0
b'MN908947.3'   62  .   TTCTCT  AAAAAAAAAAAACCCAAAAAAA  0.302   PASS        GT:GQ   1:0
b'MN908947.3'   69  .   AA  CTTT    0.076   PASS        GT:GQ   1:0
b'MN908947.3'   72  .   GAACTTT CCCTTAAAAAAAAAAAA   0.272   PASS        GT:GQ   1:0

I asked the medaka team, but they told me they have never seen this behaviour outside artic. I am not sure how to troubleshoot this either.

Many thanks Paola

will-rowe commented 3 years ago

Hi @szufan and @PaolaArzuffi - thanks for opening the issue.

I've had a few people report this issue but I've never been able to replicate it. I thought it was fixed with the latest conda recipe for artic, so sorry for closing the other issue prematurely.

Can I ask either of you to make a fresh environment and install just the artic pipeline?

conda install -c bioconda -c conda-forge artic

szufan commented 3 years ago

Thanks, @will-rowe .

Yes, the medaka workflow in artic works fine. Ideally, we'd like to keep using artic-ncov2019.

will-rowe commented 3 years ago

No worries @szufan - just trying to catch where the issue is. It's looking like something incompatible with this artic-ncov2019 env and the dependencies being used in the latest versions of the artic pipeline. I don't actually maintain this repo so I'll check in with Nick and see if we can update it to be more friendly with the latest artic pipeline requirements

cpmorris82 commented 3 years ago

I have had this exact issue with the b'MN908947.3' and failure of longshot happen to me now on 3 different machines. Ubuntu 18 or 20. WSL, or full Ubuntu. Let me know if anything woult help pinpoint this.

nickloman commented 3 years ago

I am pretty sure we just need to update artic-ncov2019 to mirror the latest versions of dependencies around medaka as per fieldbioinformatics?

will-rowe commented 3 years ago

This environment doesn't actually contain any of the pipeline dependencies directly, we pull them in with the pipeline via conda. This environment contains rampart, artic and then some other stuff. The fix would be to get rid of the "other stuff" - I can't see where this is being used in the SOPs? e.g. datrie, eigen, ete3 etc.? One of these will be pinning an incompatible dependency I think.

Second, this environment has pinned an older version of the pipeline. It might be time to bump that up to the latest version? This is currently possible in the current environment as there are conflicts (from the "other stuff") - and we also have a new release candidate which we should release soon (which has a cleaner env), so maybe wait until that until we update this environment?

rebeelouise commented 2 years ago

Hi all,

I am now having this error! I installed with mamba as opposed to conda as I was having some other issues with dependencies etc. Was there any resolution to this?

:)

rebeelouise commented 2 years ago

Hi all,

I am now having this error! I installed with mamba as opposed to conda as I was having some other issues with dependencies etc. Was there any resolution to this?

:)

error: {} ERROR: Max read coverage set to 0.
Command failed:longshot

This script has been working fine since I have started working on a new system.. any advise welcomed!

artic-network / artic-ncov2019

Medaka enabled pipeline exits with longshot failure #53