dsarov / ARDaP

Comprehensive resistance detection from WGS data
17 stars 6 forks source link

Ardap doesn't report correctly #33

Closed Koen-vdl closed 6 months ago

Koen-vdl commented 6 months ago

Hi,

I noticed all of my burky isolates have no drug resistance predicted after completing ardap. I troubleshooted by downloading some read-sets from the ardappaper and confirmed ardap fails to report any resistance determinants. I would appreciate advice to get ardapto report correctly.

I installed the conda software environment as detailed below as the recommended approach (yaml with more version fixing) didn't work:

conda create -n ardap_test -y
conda activate ardap_test
conda install biopython art trimmomatic bwa bedtools=2.28.0 seqtk pindel mosdepth samtools=1.9 picard=2.27.5 gatK4 sqlite snpEff=4.3.1t nextflow=22.10.6 fasttree bcftools kma
pip3 install tabulate cgecore gitpython python-dateutil

I then run ardap like this: nextflow run ~/bin/ardap/main.nf --species Burkholderia_pseudomallei --fastq "/srv/READ_ARCHIVE/trimmomatic/$id/$id*R{1,2}.fastq.gz"

ardap runs complete successfully with no errors reported:

executor >  local (8)
[17/1a3704] process > IndexReference                  [100%] 1 of 1 ✔
[22/4440fb] process > Trimmomatic (SRR3404582)        [100%] 1 of 1 ✔
[b2/c7caea] process > Downsample (SRR3404582)         [100%] 1 of 1 ✔
[2e/e246bb] process > ReferenceAlignment (SRR3404582) [100%] 1 of 1 ✔
[3c/322b40] process > Deduplicate (SRR3404582)        [100%] 1 of 1 ✔
[18/83d616] process > VariantCalling (SRR3404582)     [100%] 1 of 1 ✔
[cf/8d982f] process > SqlSnpsIndels (SRR3404582)      [100%] 1 of 1 ✔
[06/33875f] process > R_report (SRR3404582)           [100%] 1 of 1 ✔

Done! Result files are in --> ./Outputs

However, the reports indicate "No antibiotic resistance identified".

I tested it with SRR3404582 reads from the ardap paper (sample MSHR5654) which should report 3 resistance determinates. As you can see none are reported:

image

dsarov commented 6 months ago

Hi, Could you possible e-mail me your nextflow.log file? Easiest e-mail for me is dsarovich@usc.edu.au

I'll take a look and get back to you.

Cheers,

Derek

Koen-vdl commented 6 months ago

Thanks for getting back to me so soon Derek. I pasted the log here:

.nextflow.log

dsarov commented 6 months ago

I think the issue might be how the reads are being loaded into the pipeline. Try copying or linking the reads into the directory where you are running the pipeline and leaving --fastq as "_{1,2}.fastq.gz". When you downloaded from the SRA, what did your reads end up being named? When I tested this morning it worked, but my reads were named SRR3404582_1.fastq.gz and SRR3404582_2.fastq.gz. I'm not sure what the names would have to be to match the --fastq you have in your command (R{1,2}.fastq.gz ).

If that still doesn't work, could you take a look at the VCF file that's output in Outputs/Variants/Annotated and see if there are any SNPs being identified at all?

Cheers,

Derek

Koen-vdl commented 6 months ago

Hi @dsarov,

I automatically pass fastq-dumps from SRA to trimmomatic hence the renaming to R1 R2. I renamed them to SRR3404582_1.fastq.gz and SRR3404582_2.fastq.gz and placed them in ~/bin/ardap/ as per your suggestion. I then ran ardap with: nextflow run ~/bin/ardap/main.nf --species Burkholderia_pseudomallei --fastq "*_{1,2}.fastq.gz"

Unfortunately this did not change anything: I still don't pick up any AMR.

I had a look at the VCF files in Outputs/Variants/Annotated: these are empty files. I then checked the unannotated files in Outputs/Variants/VCFs/: these have some entries in them. I uploaded the VCF folders here in case you want to take a peek. SRR3404582.zip

Can't work out what's causing this...

dsarov commented 6 months ago

I posted a reply a second ago but then actually looked through your attached files... I should have done that first :)

It looks like the annotation component of the pipeline is failing. The SNPs are being identified but I'm guessing that snpEffect is failing for some reason. Could you load the ARDaP environment and try running snpEffect to annotate one of your VCF files? If you are in the directory with the vcf, the command should be...

snpEff eff -t -nodownload -no-downstream -no-intergenic -ud 100 -v -dataDir ~/bin/ardap/resources/snpeff Burkholderia_pseudomallei_k96243 SRR3404582.PASS.snps.vcf > SRR3404582.PASS.snps.annotated.vcf

I've attached what that file should look like after annotation but I'm guessing it'll have an issue when you try running it on your system.

Cheers,

Derek

SRR3404582.PASS.snps.annotated.zip

Koen-vdl commented 6 months ago

The resulting SRR3404582.PASS.snps.annotated.vcf is empty and the following text is printed to the screen:

(ardap_test) koen_vdl@srvlxhpc1:~/bin/ardap/Outputs/Variants/VCFs$ snpEff eff -t -nodownload -no-downstream -no-intergenic -ud 100 -v -dataDir ~/bin/ardap/resources/snpeff Burkholderia_pseudomallei_k96243 SRR3404582.PASS.snps.vcf > test
00:00:00        SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00        Command: 'eff'
00:00:00        Reading configuration file 'snpEff.config'. Genome: 'Burkholderia_pseudomallei_k96243'
00:00:00        Reading config file: /home/koen_vdl/bin/ardap/Outputs/Variants/VCFs/snpEff.config
00:00:00        Reading config file: /home/koen_vdl/miniconda3/envs/ardap_test/share/snpeff-4.3.1t-5/snpEff.config
00:00:00        done
00:00:00        Reading database for genome version 'Burkholderia_pseudomallei_k96243' from file '/home/koen_vdl/bin/ardap/resources/snpeff/Burkholderia_pseudomallei_k96243/snpEffectPredictor.bin' (this might take a while)
00:00:00        done
00:00:00        Loading Motifs and PWMs
00:00:00        Building interval forest
00:00:00        done.
00:00:00        Genome stats :
#-----------------------------------------------
# Genome name                : 'Burkholderia_pseudomallei_k96243'
# Genome version             : 'Burkholderia_pseudomallei_k96243'
# Genome ID                  : 'Burkholderia_pseudomallei_k96243[0]'
# Has protein coding info    : true
# Has Tr. Support Level info : true
# Genes                      : 6027
# Protein coding genes       : 5850
#-----------------------------------------------
# Transcripts                : 6027
# Avg. transcripts per gene  : 1.00
# TSL transcripts            : 0
#-----------------------------------------------
# Checked transcripts        :
#               AA sequences :   5729 ( 97.93% )
#              DNA sequences :      0 ( 0.00% )
#-----------------------------------------------
# Protein coding transcripts : 5850
#              Length errors :    105 ( 1.79% )
#  STOP codons in CDS errors :     20 ( 0.34% )
#         START codon errors :    802 ( 13.71% )
#        STOP codon warnings :     17 ( 0.29% )
#              UTR sequences :      0 ( 0.00% )
#               Total Errors :    804 ( 13.74% )
# WARNING                    : No protein coding transcript has UTR
#-----------------------------------------------
# Cds                        : 5731
# Exons                      : 6120
# Exons with sequence        : 6120
# Exons without sequence     : 0
# Avg. exons per transcript  : 1.02
# WARNING                    : No mitochondrion chromosome found
#-----------------------------------------------
# Number of chromosomes      : 2
# Chromosomes                : Format 'chromo_name size codon_table'
#               '1'     4074542 Standard
#               '2'     3173005 Standard
#-----------------------------------------------

00:00:01        Predicting variants
00:00:01        Running multi-threaded mode (numThreads=64).
dsarov commented 6 months ago

After the last line of that output, were there a couple more lines? 00:00:01 done. 00:00:01 Logging 00:00:02 Checking for updates... 00:00:03 Done.

With the above command, the annotated SNPs should be output into a "test". Was there anything in that file? Could you double check that the VCF (SRR3404582.PASS.snps.vcf) has some snps for snpEff to annotate?

Other than the above, there doesn't appear to be anything wrong with your snpEff installation. Maybe the error has been output in the test file?

Koen-vdl commented 6 months ago

Hi,

I tried everything from scratch again and now annotation works. snpEff eff -t -nodownload -no-downstream -no-intergenic -ud 100 -v -dataDir ~/bin/ardap/resources/snpeff Burkholderia_pseudomallei_k96243 SRR3404582.PASS.snps.vcf > SRR3404582.PASS.snps.annotated.vcf

SRR3404582.PASS.snps.annotated.zip

I checked main.nf and the pipeline uses the same snpEff approach. Can't work out why it fails at this step.

I uploaded the entire run here (including the work folder) in case you wish to take a closer look: ==> https://e.pcloud.link/publink/show?code=XZqWYMZ3z1BvCoBSOBhu59q2IHoW4xo3Ezy

Cheers, KV

dsarov commented 6 months ago

Thanks Koen,

I was just about to ask you to upload the work folder. There'll be a log in there that tells us what's going wrong. I'm a bit busy today but will get back to you. Keen to find the bug so I can fix it in the software.

Cheers,

Derek

Koen-vdl commented 6 months ago

Sweet! Thanks for the support, much appreciated.

dsarov commented 6 months ago

I think I've tracked it down. From what I can tell, snpEff was running in multithreaded mode, but I hadn't restricted the number of threads to be used, so was running with the number of available threads on the CPU rather than the number of threads specified by the job resources. In your log files, snpEff ran but looked to not complete properly. It never reported an error, which allowed the pipeline to keep running. I'm doing a reasonably sized update at the moment and will include that bug. I'll let you know when I've pushed the update.

Still not exactly sure if this is the problem, but worth a shot

dsarov commented 6 months ago

Hi Koen,

I think I've (hopefully) fixed the error now. Could you update your installation and try again?

Thanks

Derek

Koen-vdl commented 6 months ago

Updated and ran again. It works like a charm now! Thank you so much for ironing out this issue @dsarov !

dsarov commented 6 months ago

Fantastic!

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Koen-vdl @.> Sent: Tuesday, May 7, 2024 7:39:36 PM To: dsarov/ARDaP @.> Cc: Derek Sarovich @.>; Mention @.> Subject: Re: [dsarov/ARDaP] Ardap doesn't report correctly (Issue #33)

CAUTION:This is an external email. Please take care when selecting links or opening attachments. When in doubt, please contact the IT Service Desk or IT Student Help Desk.

Updated and ran again. It works like a charm now! Thank you so much for ironing out this issue @dsarovhttps://github.com/dsarov !

— Reply to this email directly, view it on GitHubhttps://github.com/dsarov/ARDaP/issues/33#issuecomment-2097879490, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABVRRXBUYDKNOGG7JMCEZLDZBCONRAVCNFSM6AAAAABGWJAAY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJXHA3TSNBZGA. You are receiving this because you were mentioned.Message ID: @.***>