GunzIvan28 / rMAP

Bacterial analysis toolbox for full ESKAPE pathogen characterization and profiling the resistome, mobilome, virulome & phylogenomics using WGS
GNU General Public License v3.0
20 stars 17 forks source link

unable to make assembly contig.fa #4

Closed Javaria-Ashraf closed 8 months ago

Javaria-Ashraf commented 3 years ago

Dear Ivan

Im unable to get assembly and as the assembly fails all the other steps are fail.

Please help me withe error:

`rMAP is will now perform De-novo Genome Assembly using shovill...
Processing sample: 7
Unknown option: assembler
Synopsis:
  Faster de novo assembly pipeline based around Spades
Usage:
  shovill [options] --outdir DIR --R1 R1.fq.gz --R2 R2.fq.gz
Author:
  Torsten Seemann <torsten.seemann@gmail.com>
Options:
  --help          This help
  --version       Print version and exit
  --check         Check dependencies are installed
  --debug         Debug info (default: OFF)
  --cpus N        Number of CPUs to use (default: 16)
  --outdir XXX    Output folder (default: '')
  --namefmt XXX   Format of contig FASTA IDs in 'printf' style (default: 'contig%05d')
  --force         Force overwite of existing output folder (default: OFF)
  --R1 XXX        Read 1 FASTQ (default: '')
  --R2 XXX        Read 2 FASTQ (default: '')
  --depth N       Sub-sample --R1/--R2 to this depth. Disable with --depth 0 (default: 100)
  --gsize XXX     Estimated genome size <blank=AUTODETECT> (default: '')
  --kmers XXX     K-mers to use <blank=AUTO> (default: '')
  --opts XXX      Extra SPAdes options eg. --plasmid --sc ... (default: '')
  --nocorr        Disable post-assembly correction (default: OFF)
  --trim          Use Trimmomatic to remove common adaptors first (default: OFF)
  --trimopt XXX   Trimmomatic options (default: 'ILLUMINACLIP:/home/isd/miniconda3/envs/rMAP-1.0/bin/../db/trimmomatic.fa:1:30:11 LEADING:3 TRAILING:3 MINLEN:30 TOPHRED33')
  --minlen N      Minimum contig length <0=AUTO> (default: 1)
  --mincov n.nn   Minimum contig coverage <0=AUTO> (default: 2)
  --asm XXX       Spades result to correct: before_rr contigs scaffolds (default: 'contigs')
  --tmpdir XXX    Fast temporary directory (default: '/tmp')
  --ram n.nn      Try to keep RAM usage below this many GB (default: 8)
  --keepfiles     Keep intermediate files (default: OFF)
Documentation:
  https://github.com/tseemann/shovill
mv: cannot stat '/home/isd/Desktop/salmonellaTyphi/rMAP/rMAP_output_s7/assembly/7/contigs.fa': No such file or directory
/home/isd/miniconda3/envs/rMAP-1.0/bin/rMAP: line 412: /home/isd/Desktop/salmonellaTyphi/rMAP/rMAP_output_s7/assembly/7/7-assembly-stats.tab: No such file or directory
/home/isd/miniconda3/envs/rMAP-1.0/bin/rMAP: line 413: /home/isd/Desktop/salmonellaTyphi/rMAP/rMAP_output_s7/assembly/7/7-assembly-stats.txt: No such file or directory
Your Assembly Run Took Approximately: 0 seconds.

`

My command is ::

rMAP -t 8 --reference /home/isd/Desktop/salmonellaTyphi/rMAP/sequenceSalmonella.gbk --input /home/isd/Desktop/salmonellaTyphi/rMAP/rMAP_datasets --output rMa_output_sal --assembly shovill --amr --varcall --trim --phylogeny --pangenome --gen-ele`

Even when reached to variant calling step it gave the following error:

rMAP is will now perform Variant Calling...
.gbk found !!! Annotation Mode Enabled...Preparing Annotation files...
Traceback (most recent call last):
  File "/home/isd/miniconda3/envs/rMAP-1.0/bin/biopython.convert", line 10, in <module>
    sys.exit(main())
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__main__.py", line 7, in main
    convert(*get_args(sys.argv[1:]))
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__init__.py", line 194, in convert
    with input_path.open("r") as handle:
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1208, in open
    opener=self._opener)
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1063, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'rMa_output_sal/references/home/isd/Desktop/salmonellaTyphi/rMAP/sequenceSalmonella.gbk'
Traceback (most recent call last):
  File "/home/isd/miniconda3/envs/rMAP-1.0/bin/biopython.convert", line 10, in <module>
    sys.exit(main())
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__main__.py", line 7, in main
    convert(*get_args(sys.argv[1:]))
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__init__.py", line 194, in convert
    with input_path.open("r") as handle:
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1208, in open
    opener=self._opener)
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1063, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'rMa_output_sal/references/home/isd/Desktop/salmonellaTyphi/rMAP/sequenceSalmonella.gbk'
[bwa_idx_build] fail to open file 'rMa_output_sal/references/*.fa' : No such file or directory
[E::fai_build3_core] Failed to open the file rMa_output_sal/references/*.fa
[faidx] Could not build fai index rMa_output_sal/references/*.fa.fai
Processing sample: 7
[E::bwa_idx_load_from_disk] fail to locate the index files
[samclip] ERROR: Can't see 'rMa_output_sal/references/*.fai' index. Run 'samtools faidx rMa_output_sal/references/*.fai' ?
samtools sort: failed to read header from "-"
[bam_mating_core] ERROR: Couldn't read header
samtools sort: failed to read header from "-"
[markdup] error reading header
samtools index: "rMa_output_sal/variant_calling/7.mrkdup.bam" is in a format that cannot be usefully indexed
Your Alignment Run Took Approximately: 1 seconds.

rMAP is will now perform Variant Calling ...
Processing sample: 7
could not open rMa_output_sal/references/*.fa
normalize v0.5

options:     input VCF file                                  -
         [o] output VCF file                                 -
         [w] sorting window size                             10000
         [n] no fail on reference inconsistency for non SNPs false
         [q] quiet                                           false
         [d] debug                                           false
         [r] reference FASTA file                            rMa_output_sal/references/*.fa

Failed to read from rMa_output_sal/variant_calling/7.raw.vcf: unknown file type
[bcf_ordered_reader.cpp:49 BCFOrderedReader] Not a VCF/BCF file: -
Failed to read from standard input: unknown file type
Your Variant Call Run Took Approximately: 0 seconds.

rMAP is will now perform Annotation of Variants...
chmod: cannot access '/home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.jar': No such file or directory
chmod: cannot access '/home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/*': No such file or directory
Processing sample: 7
cat: /home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.config: No such file or directory
cp: cannot stat 'rMa_output_sal/references/.fa': No such file or directory
cp: cannot stat 'rMa_output_sal/references/.gff3': No such file or directory
gzip: rMa_output_sal/variant_calling/snps/7/references/ref/genes.gff: No such file or directory
00:00:00    SnpEff version SnpEff 5.0e (build 2021-03-09 06:01), by Pablo Cingolani
00:00:00    Command: 'build'
00:00:00    Building database for 'ref'
00:00:00    Reading configuration file 'rMa_output_sal/variant_calling/snps/7/references/snpEff.config'. Genome: 'ref'
00:00:00    Reading config file: /home/isd/rMa_output_sal/variant_calling/snps/7/references/snpEff.config
java.lang.RuntimeException: Error parsing property 'ref..codonTable'. No such codon table 'Bacterial_and_Plant_Plastid'
    at org.snpeff.snpEffect.Config.createCodonTables(Config.java:173)
    at org.snpeff.snpEffect.Config.readConfig(Config.java:662)
    at org.snpeff.snpEffect.Config.init(Config.java:487)
    at org.snpeff.snpEffect.Config.<init>(Config.java:121)
    at org.snpeff.SnpEff.loadConfig(SnpEff.java:449)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:365)
    at org.snpeff.SnpEff.run(SnpEff.java:1188)
    at org.snpeff.SnpEff.main(SnpEff.java:168)
00:00:00    Logging
00:00:02    Done.
Error: Unable to access jarfile /home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.jar
Loading reference: rMa_output_sal/variant_calling/snps/7/references/genomes/ref.fa

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file 'rMa_output_sal/variant_calling/snps/7/references/genomes/ref.fa': No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/Root/Root.pm:447
STACK: Bio::Root::IO::_initialize_io /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/Root/IO.pm:268
STACK: Bio::SeqIO::_initialize /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:513
STACK: Bio::SeqIO::fasta::_initialize /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:389
STACK: Bio::SeqIO::new /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:435
STACK: /home/isd/miniconda3/envs/rMAP-1.0/bin/snippy-vcf_to_tab:39
-----------------------------------------------------------
Your Variant Annotation Run Took Approximately: 2 seconds.
chmod: cannot access '/home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.jar': No such file or directory
chmod: cannot access '/home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/*': No such file or directory
cat: /home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.config: No such file or directory
cp: cannot stat 'rMa_output_sal/references/.fa': No such file or directory
cp: cannot stat 'rMa_output_sal/references/.gff3': No such file or directory
gzip: rMa_output_sal/variant_calling/snps/combined-snps/references/ref/genes.gff: No such file or directory
00:00:00    SnpEff version SnpEff 5.0e (build 2021-03-09 06:01), by Pablo Cingolani
00:00:00    Command: 'build'
00:00:00    Building database for 'ref'
00:00:00    Reading configuration file 'rMa_output_sal/variant_calling/snps/combined-snps/references/snpEff.config'. Genome: 'ref'
00:00:00    Reading config file: /home/isd/rMa_output_sal/variant_calling/snps/combined-snps/references/snpEff.config
java.lang.RuntimeException: Error parsing property 'ref..codonTable'. No such codon table 'Bacterial_and_Plant_Plastid'
    at org.snpeff.snpEffect.Config.createCodonTables(Config.java:173)
    at org.snpeff.snpEffect.Config.readConfig(Config.java:662)
    at org.snpeff.snpEffect.Config.init(Config.java:487)
    at org.snpeff.snpEffect.Config.<init>(Config.java:121)
    at org.snpeff.SnpEff.loadConfig(SnpEff.java:449)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:365)
    at org.snpeff.SnpEff.run(SnpEff.java:1188)
    at org.snpeff.SnpEff.main(SnpEff.java:168)
00:00:00    Logging
00:00:01    Done.
Error: Unable to access jarfile /home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.jar
Loading reference: rMa_output_sal/variant_calling/snps/combined-snps/references/genomes/ref.fa

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file 'rMa_output_sal/variant_calling/snps/combined-snps/references/genomes/ref.fa': No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/Root/Root.pm:447
STACK: Bio::Root::IO::_initialize_io /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/Root/IO.pm:268
STACK: Bio::SeqIO::_initialize /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:513
STACK: Bio::SeqIO::fasta::_initialize /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:389
STACK: Bio::SeqIO::new /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:435
STACK: /home/isd/miniconda3/envs/rMAP-1.0/bin/snippy-vcf_to_tab:39
-----------------------------------------------------------
Your Run Took Approximately: 0 seconds.
VCF Annotation Successfuly Completed in: 0 seconds.
Your Variant Call Run Took Approximately: 1 seconds.

Can you please help me with these errors?

Thank you in advance!

Best wishes,

Jia

Javaria-Ashraf commented 3 years ago

Dear Ivan,

i think i have figured out the issue, I found that the shovill version that the script installed by default is 0.9.0 in which there is no --assember option available, where as in newer version :

shovill
Synopsis:
  Faster de novo assembly pipeline based around Spades
Usage:
  shovill [options] --outdir DIR --R1 R1.fq.gz --R2 R2.fq.gz
Author:
  Torsten Seemann <torsten.seemann@gmail.com>
Options:
  --help          This help
  --version       Print version and exit
  --check         Check dependencies are installed
  --debug         Debug info (default: OFF)
  --cpus N        Number of CPUs to use (default: 16)
  --outdir XXX    Output folder (default: '')
  --namefmt XXX   Format of contig FASTA IDs in 'printf' style (default: 'contig%05d')
  --force         Force overwite of existing output folder (default: OFF)
  --R1 XXX        Read 1 FASTQ (default: '')
  --R2 XXX        Read 2 FASTQ (default: '')
  --depth N       Sub-sample --R1/--R2 to this depth. Disable with --depth 0 (default: 100)
  --gsize XXX     Estimated genome size <blank=AUTODETECT> (default: '')
  --kmers XXX     K-mers to use <blank=AUTO> (default: '')
  --opts XXX      Extra SPAdes options eg. --plasmid --sc ... (default: '')
  --nocorr        Disable post-assembly correction (default: OFF)
  --trim          Use Trimmomatic to remove common adaptors first (default: OFF)
  --trimopt XXX   Trimmomatic options (default: 'ILLUMINACLIP:/home/isd/miniconda3/envs/rMAP-1.0/bin/../db/trimmomatic.fa:1:30:11 LEADING:3 TRAILING:3 MINLEN:30 TOPHRED33')
  --minlen N      Minimum contig length <0=AUTO> (default: 1)
  --mincov n.nn   Minimum contig coverage <0=AUTO> (default: 2)
  --asm XXX       Spades result to correct: before_rr contigs scaffolds (default: 'contigs')
  --tmpdir XXX    Fast temporary directory (default: '/tmp')
  --ram n.nn      Try to keep RAM usage below this many GB (default: 8)
  --keepfiles     Keep intermediate files (default: OFF)

$ shovill --version
shovill 0.9.0

Whereas the newer version synopsis shown on the github readme shows the --assembler.

How do i fix this?

Thanks

Jia

GunzIvan28 commented 3 years ago

Hey Asharaf, I during my troubleshooting for the error you encountered, I tried a stand-alone run of the shovill software after removal of the --assembler option from my code and realized the program keeps blowing up because some perl modules can't be properly loaded. The snippet below shows the new errors that are coming up. I think Torsten the author changed something in the code....

$ shovill --R1 dataset-profile/trimmed_reads/ID246.clean_1.fastq.gz --R2 dataset-profile/trimmed_reads/ID246.clean_2.fastq.gz --cpus 12 --gsize 3.4M --force --outdir dataset-profile/assembly/ID246
Hello ivan
You ran: /home/ivan/miniconda3/envs/rMAP-1.0/bin/shovill --R1 dataset-profile/trimmed_reads/ID246.clean_1.fastq.gz --R2 dataset-profile/trimmed_reads/ID246.clean_2.fastq.gz --cpus 12 --gsize 3.4M --force --outdir dataset-profile/assembly/ID246
This is shovill 0.9.0
Written by Torsten Seemann <torsten.seemann@gmail.com>
Homepage is https://github.com/tseemann/shovill
Operating system is linux
Using seqtk - /home/ivan/miniconda3/envs/rMAP-1.0/bin/seqtk
Using pigz - /home/ivan/miniconda3/envs/rMAP-1.0/bin/pigz
Using kmc - /home/ivan/miniconda3/envs/rMAP-1.0/bin/kmc
Using trimmomatic - /home/ivan/miniconda3/envs/rMAP-1.0/bin/trimmomatic
Using lighter - /home/ivan/miniconda3/envs/rMAP-1.0/bin/lighter
Using flash - /home/ivan/miniconda3/envs/rMAP-1.0/bin/flash
Using spades.py - /home/ivan/miniconda3/envs/rMAP-1.0/bin/spades.py
Using bwa - /home/ivan/miniconda3/envs/rMAP-1.0/bin/bwa
Using samtools - /home/ivan/miniconda3/envs/rMAP-1.0/bin/samtools
Using java - /home/ivan/miniconda3/envs/rMAP-1.0/bin/java
Using pilon - /home/ivan/miniconda3/envs/rMAP-1.0/bin/pilon
Using tee - /home/ivan/miniconda3/envs/rMAP-1.0/bin/tee
Using vmstat - /usr/bin/vmstat
Error: /proc must be mounted
  To mount /proc at boot you need an /etc/fstab line like:
      proc   /proc   proc    defaults
  In the meantime, run "mount proc /proc -t proc"
Could not determine available RAM  

So as I try to find a fix, use the megahit assembler for the time being since I tested it and rMAP runs pretty fine. When i find that fix for shovill, I will update the issue so let's keep this open for now.

Thanks Ivan

safinaARK commented 2 years ago

Dear @GunzIvan28 There are two issues; one is the assembler shovill issue second is snpEFF: by default when rMAP is installed via conda, it installs shovill version0.9.0 and that version does not have --assembler option so the quick fix is to edit the script by removing the --assembler option. Another is the script has a different version of sneff that is 4.5covid19-1 whereas by default conda install v 5.0 so again by changing the name in the rMAP script these issues will be solved. @Javaria-Ashraf please do these changes, I hope it works for you too as it worked for me too. thanks

SAR

GunzIvan28 commented 2 years ago

@safinaARK, I commend and salute you for taking the time to understand this pipeline like we the authors. Version 1 of rMAP was compiled during the COVID-19 pandemic and snpeff 4.5covid-19 what was available. Now depending on whether you are using a Mac or Ubuntu, the versions for snpeff both in the installers and the main script will have to change. I changed the version for the linux installer script to 5.0 because packages kept conflicting. I however haven't effected this change for the macos installer and the main script (both still use snpeff 4.5covid-19).

So @Javaria-Ashraf either you do as @safinaARK has directed or I can walk you through whatever changes you may have to make (again based on your system) whenever you find some time. All these changes will be amended in version 2 which we will release soon.

Lastly, @safinaARK I think you would be a good addition to the rMAP development. We are currently developing version 2 which we are implementing in the nextflow platform. If you are interested, I could add you as a collaborator. Just let me know.

Ivan

safinaARK commented 2 years ago

@GunzIvan28

Thank you Ivan for the offer 😊.. I would love to be a part of the developing team of rMAP! Please let me know how can I help you with this. You can contact me on my email @ safina.razzak@aku.edu; safina.arazzak@gmail.com

Thanks once again for your acknowledgement

Reagrds

SAR