Closed Koen-vdl closed 6 months ago
Hi, Could you possible e-mail me your nextflow.log file? Easiest e-mail for me is dsarovich@usc.edu.au
I'll take a look and get back to you.
Cheers,
Derek
Thanks for getting back to me so soon Derek. I pasted the log here:
I think the issue might be how the reads are being loaded into the pipeline. Try copying or linking the reads into the directory where you are running the pipeline and leaving --fastq as "_{1,2}.fastq.gz". When you downloaded from the SRA, what did your reads end up being named? When I tested this morning it worked, but my reads were named SRR3404582_1.fastq.gz and SRR3404582_2.fastq.gz. I'm not sure what the names would have to be to match the --fastq you have in your command (R{1,2}.fastq.gz ).
If that still doesn't work, could you take a look at the VCF file that's output in Outputs/Variants/Annotated and see if there are any SNPs being identified at all?
Cheers,
Derek
Hi @dsarov,
I automatically pass fastq-dumps from SRA to trimmomatic hence the renaming to R1 R2. I renamed them to SRR3404582_1.fastq.gz
and SRR3404582_2.fastq.gz
and placed them in ~/bin/ardap/
as per your suggestion. I then ran ardap with:
nextflow run ~/bin/ardap/main.nf --species Burkholderia_pseudomallei --fastq "*_{1,2}.fastq.gz"
Unfortunately this did not change anything: I still don't pick up any AMR.
I had a look at the VCF files in Outputs/Variants/Annotated
: these are empty files. I then checked the unannotated files in Outputs/Variants/VCFs/
: these have some entries in them. I uploaded the VCF folders here in case you want to take a peek.
SRR3404582.zip
Can't work out what's causing this...
I posted a reply a second ago but then actually looked through your attached files... I should have done that first :)
It looks like the annotation component of the pipeline is failing. The SNPs are being identified but I'm guessing that snpEffect is failing for some reason. Could you load the ARDaP environment and try running snpEffect to annotate one of your VCF files? If you are in the directory with the vcf, the command should be...
snpEff eff -t -nodownload -no-downstream -no-intergenic -ud 100 -v -dataDir ~/bin/ardap/resources/snpeff Burkholderia_pseudomallei_k96243 SRR3404582.PASS.snps.vcf > SRR3404582.PASS.snps.annotated.vcf
I've attached what that file should look like after annotation but I'm guessing it'll have an issue when you try running it on your system.
Cheers,
Derek
The resulting SRR3404582.PASS.snps.annotated.vcf
is empty and the following text is printed to the screen:
(ardap_test) koen_vdl@srvlxhpc1:~/bin/ardap/Outputs/Variants/VCFs$ snpEff eff -t -nodownload -no-downstream -no-intergenic -ud 100 -v -dataDir ~/bin/ardap/resources/snpeff Burkholderia_pseudomallei_k96243 SRR3404582.PASS.snps.vcf > test
00:00:00 SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00 Command: 'eff'
00:00:00 Reading configuration file 'snpEff.config'. Genome: 'Burkholderia_pseudomallei_k96243'
00:00:00 Reading config file: /home/koen_vdl/bin/ardap/Outputs/Variants/VCFs/snpEff.config
00:00:00 Reading config file: /home/koen_vdl/miniconda3/envs/ardap_test/share/snpeff-4.3.1t-5/snpEff.config
00:00:00 done
00:00:00 Reading database for genome version 'Burkholderia_pseudomallei_k96243' from file '/home/koen_vdl/bin/ardap/resources/snpeff/Burkholderia_pseudomallei_k96243/snpEffectPredictor.bin' (this might take a while)
00:00:00 done
00:00:00 Loading Motifs and PWMs
00:00:00 Building interval forest
00:00:00 done.
00:00:00 Genome stats :
#-----------------------------------------------
# Genome name : 'Burkholderia_pseudomallei_k96243'
# Genome version : 'Burkholderia_pseudomallei_k96243'
# Genome ID : 'Burkholderia_pseudomallei_k96243[0]'
# Has protein coding info : true
# Has Tr. Support Level info : true
# Genes : 6027
# Protein coding genes : 5850
#-----------------------------------------------
# Transcripts : 6027
# Avg. transcripts per gene : 1.00
# TSL transcripts : 0
#-----------------------------------------------
# Checked transcripts :
# AA sequences : 5729 ( 97.93% )
# DNA sequences : 0 ( 0.00% )
#-----------------------------------------------
# Protein coding transcripts : 5850
# Length errors : 105 ( 1.79% )
# STOP codons in CDS errors : 20 ( 0.34% )
# START codon errors : 802 ( 13.71% )
# STOP codon warnings : 17 ( 0.29% )
# UTR sequences : 0 ( 0.00% )
# Total Errors : 804 ( 13.74% )
# WARNING : No protein coding transcript has UTR
#-----------------------------------------------
# Cds : 5731
# Exons : 6120
# Exons with sequence : 6120
# Exons without sequence : 0
# Avg. exons per transcript : 1.02
# WARNING : No mitochondrion chromosome found
#-----------------------------------------------
# Number of chromosomes : 2
# Chromosomes : Format 'chromo_name size codon_table'
# '1' 4074542 Standard
# '2' 3173005 Standard
#-----------------------------------------------
00:00:01 Predicting variants
00:00:01 Running multi-threaded mode (numThreads=64).
After the last line of that output, were there a couple more lines?
00:00:01 done. 00:00:01 Logging 00:00:02 Checking for updates... 00:00:03 Done.
With the above command, the annotated SNPs should be output into a "test". Was there anything in that file? Could you double check that the VCF (SRR3404582.PASS.snps.vcf) has some snps for snpEff to annotate?
Other than the above, there doesn't appear to be anything wrong with your snpEff installation. Maybe the error has been output in the test file?
Hi,
I tried everything from scratch again and now annotation works.
snpEff eff -t -nodownload -no-downstream -no-intergenic -ud 100 -v -dataDir ~/bin/ardap/resources/snpeff Burkholderia_pseudomallei_k96243 SRR3404582.PASS.snps.vcf > SRR3404582.PASS.snps.annotated.vcf
SRR3404582.PASS.snps.annotated.zip
I checked main.nf
and the pipeline uses the same snpEff approach. Can't work out why it fails at this step.
I uploaded the entire run here (including the work folder) in case you wish to take a closer look: ==> https://e.pcloud.link/publink/show?code=XZqWYMZ3z1BvCoBSOBhu59q2IHoW4xo3Ezy
Cheers, KV
Thanks Koen,
I was just about to ask you to upload the work folder. There'll be a log in there that tells us what's going wrong. I'm a bit busy today but will get back to you. Keen to find the bug so I can fix it in the software.
Cheers,
Derek
Sweet! Thanks for the support, much appreciated.
I think I've tracked it down. From what I can tell, snpEff was running in multithreaded mode, but I hadn't restricted the number of threads to be used, so was running with the number of available threads on the CPU rather than the number of threads specified by the job resources. In your log files, snpEff ran but looked to not complete properly. It never reported an error, which allowed the pipeline to keep running. I'm doing a reasonably sized update at the moment and will include that bug. I'll let you know when I've pushed the update.
Still not exactly sure if this is the problem, but worth a shot
Hi Koen,
I think I've (hopefully) fixed the error now. Could you update your installation and try again?
Thanks
Derek
Updated and ran again. It works like a charm now! Thank you so much for ironing out this issue @dsarov !
Fantastic!
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Koen-vdl @.> Sent: Tuesday, May 7, 2024 7:39:36 PM To: dsarov/ARDaP @.> Cc: Derek Sarovich @.>; Mention @.> Subject: Re: [dsarov/ARDaP] Ardap doesn't report correctly (Issue #33)
CAUTION:This is an external email. Please take care when selecting links or opening attachments. When in doubt, please contact the IT Service Desk or IT Student Help Desk.
Updated and ran again. It works like a charm now! Thank you so much for ironing out this issue @dsarovhttps://github.com/dsarov !
— Reply to this email directly, view it on GitHubhttps://github.com/dsarov/ARDaP/issues/33#issuecomment-2097879490, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABVRRXBUYDKNOGG7JMCEZLDZBCONRAVCNFSM6AAAAABGWJAAY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJXHA3TSNBZGA. You are receiving this because you were mentioned.Message ID: @.***>
Hi,
I noticed all of my burky isolates have no drug resistance predicted after completing
ardap
. I troubleshooted by downloading some read-sets from theardap
paper and confirmedardap
fails to report any resistance determinants. I would appreciate advice to getardap
to report correctly.I installed the conda software environment as detailed below as the recommended approach (yaml with more version fixing) didn't work:
I then run ardap like this:
nextflow run ~/bin/ardap/main.nf --species Burkholderia_pseudomallei --fastq "/srv/READ_ARCHIVE/trimmomatic/$id/$id*R{1,2}.fastq.gz"
ardap runs complete successfully with no errors reported:
However, the reports indicate "No antibiotic resistance identified".
I tested it with SRR3404582 reads from the ardap paper (sample MSHR5654) which should report 3 resistance determinates. As you can see none are reported: