Closed mictadlo closed 3 years ago
Hi Michal,
Can you please upload the following log files, to check what's going on:
Best, Alex
HI Michal,
I checked the logs and Wengan stopped because the long-read sequences contains N or another invalid character, Wengan give the following message:
terminate called after throwing an instance of 'std::invalid_argument'
what(): invalid DNA base found in DnaBitset class
You can replace the invalid character in the long-read sequences using my seqtk fork and the command:
seqtk iupac2basesA long-reads.fastq.gz > long-reads.clean.fq.gz
Then you can resume the assembly, but using the cleaned long-read file.
Best, Alex
Feel free to reopen if you have further questions!. Best, Alex
Hi Alex, I used 3 Tb of memory and it was still not enough. Is there a way to reduce the amount of memory?
export MALLOC_PER_THREAD=1
/lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/DiscovarExp READS="1740D-43-06_S0_L001_R1.fastq.gz,1740D-43-06_S0_L001_R2.fastq.gz" OUT_DIR=/tmp/asm1D NUM_THREADS=8 2> asm1.Disco_denovo.err > asm1.Disco_denovo.log
/bin/sh: line 1: 153941 Killed /lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/DiscovarExp READS="1740D-43-06_S0_L001_R1.fastq.gz,1740D-43-06_S0_L001_R2.fastq.gz" OUT_DIR=/tmp/asm1D NUM_THREADS=8 2> asm1.Disco_denovo.err > asm1.Disco_denovo.log
asm1.mk:4: recipe for target 'asm1.contigs-disco.fa' failed
make: *** [asm1.contigs-disco.fa] Error 137
PBS Job 9607252.pbs
CPU time : 18:48:50
Wall time : 03:38:03
Mem usage : 3145728000kb
> cat asm1.mk
.DELETE_ON_ERROR:
#Wengan automatic generated makefile
asm1.contigs-disco.fa :
export MALLOC_PER_THREAD=1
/lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/DiscovarExp READS="1740D-43-06_S0_L001_R1.fastq.gz,1740D-43-06_S0_L001_R2.fastq.gz" OUT_DIR=/tmp/asm1D NUM_THREADS=8 2> asm1.Disco_denovo.err > asm1.Disco_denovo.log
cp -a /tmp/asm1D asm1D
ln -s asm1D/a.final/a.lines.fasta asm1.contigs-disco.fa
asm1.contigs.disco.fa : asm1.contigs-disco.fa
/lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/seqtk cutN -n 1 asm1.contigs-disco.fa | /lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/seqtk seq -L 200 - | /lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/seqtk iupac2bases - | /lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/seqtk rename - D | /lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/seqtk seq -l 60 - > asm1.contigs.disco.fa
asm11.fm.sam : asm1.contigs.disco.fa
@echo asm11 1740D-43-06_S0_L001_R1.fastq.gz 1740D-43-06_S0_L001_R2.fastq.gz > asm1.fms.txt
/lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/fastmin-sg shortr -c 50 -k 21 -w 10 -q 20 -r 50000 -t 8 asm1.contigs.disco.fa asm1.fms.txt 2>asm1.fms.err >asm1.fms.log
asm1.im.1.I1000.fm.sam : asm1.im.1.I500.fm.sam
asm1.im.1.I2000.fm.sam : asm1.im.1.I1000.fm.sam
asm1.im.1.I500.fm.sam : asm11.fm.sam
@echo asm1.im.1 allPacBio-ONT.clean.fasta.gz > asm1.fml.im.txt
/lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/fastmin-sg pacraw -k 20 -w 5 -q 40 -m 150 -r 300 -I 500,1000,2000 -t 8 asm1.contigs.disco.fa asm1.fml.im.txt 2>asm1.fml.im.err >asm1.fml.im.log
-rm -f longreads.asm1.im.1.fa
asm1.MBC1.msplit.fa : asm1.contigs.disco.fa asm11.fm.sam asm1.im.1.I500.fm.sam asm1.im.1.I1000.fm.sam asm1.im.1.I2000.fm.sam
@echo "asm11.fm.sam 0" > asm1.fms.sams.txt
@echo "asm1.im.1.I500.fm.sam 1" >> asm1.fms.sams.txt
@echo "asm1.im.1.I1000.fm.sam 1" >> asm1.fms.sams.txt
@echo "asm1.im.1.I2000.fm.sam 1" >> asm1.fms.sams.txt
/lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/intervalmiss -d 1 --clib 1 -t 8 -s asm1.fms.sams.txt -c asm1.contigs.disco.fa -p asm1 2>asm1.im.err >asm1.im.log
asm1.MBC1.msplit.cov.txt : asm1.MBC1.msplit.fa
grep ">" asm1.MBC1.msplit.fa | sed 's/>//' | awk '{print $$1" "$$2}' | sed 's/>//g' > asm1.MBC1.msplit.cov.txt
asm11.I500.fm.sam : longreads.asm11.fa
asm11.I1000.fm.sam : asm11.I500.fm.sam
asm11.I2000.fm.sam : asm11.I1000.fm.sam
asm11.I3000.fm.sam : asm11.I2000.fm.sam
asm11.I4000.fm.sam : asm11.I3000.fm.sam
asm11.I5000.fm.sam : asm11.I4000.fm.sam
asm11.I6000.fm.sam : asm11.I5000.fm.sam
asm11.I7000.fm.sam : asm11.I6000.fm.sam
asm11.I8000.fm.sam : asm11.I7000.fm.sam
asm11.I10000.fm.sam : asm11.I8000.fm.sam
asm11.I15000.fm.sam : asm11.I10000.fm.sam
asm11.I20000.fm.sam : asm11.I15000.fm.sam
longreads.asm11.fa : asm1.MBC1.msplit.fa asm1.MBC1.msplit.cov.txt
@echo asm11 allPacBio-ONT.clean.fasta.gz > asm1.fml.txt
/lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/fastmin-sg pacraw -k 20 -w 5 -q 40 -m 150 -r 300 -t 8 -p asm1 -I 500,1000,2000,3000,4000,5000,6000,7000,8000,10000,15000,20000 asm1.MBC1.msplit.fa asm1.fml.txt 2>asm1.fml.err >asm1.fml.log
asm1.SPolished.asm.wengan.fasta : asm1.MBC1.msplit.fa longreads.asm11.fa asm11.I500.fm.sam asm11.I1000.fm.sam asm11.I2000.fm.sam asm11.I3000.fm.sam asm11.I4000.fm.sam asm11.I5000.fm.sam asm11.I6000.fm.sam asm11.I7000.fm.sam asm11.I8000.fm.sam asm11.I10000.fm.sam asm11.I15000.fm.sam asm11.I20000.fm.sam
@echo asm11.I500.fm.sam > asm1.sams.txt
@echo asm11.I1000.fm.sam >> asm1.sams.txt
@echo asm11.I2000.fm.sam >> asm1.sams.txt
@echo asm11.I3000.fm.sam >> asm1.sams.txt
@echo asm11.I4000.fm.sam >> asm1.sams.txt
@echo asm11.I5000.fm.sam >> asm1.sams.txt
@echo asm11.I6000.fm.sam >> asm1.sams.txt
@echo asm11.I7000.fm.sam >> asm1.sams.txt
@echo "asm11.I8000.fm.sam 8000" >> asm1.sams.txt
@echo "asm11.I10000.fm.sam 10000" >> asm1.sams.txt
@echo "asm11.I15000.fm.sam 15000" >> asm1.sams.txt
@echo "asm11.I20000.fm.sam 20000" >> asm1.sams.txt
/lustre/work-lustre/waterhouse_team/apps/wengan-v0.2-bin-Linux/bin/liger --mlp 10000 --mit 20000000 -t 8 -c asm1.MBC1.msplit.fa -l longreads.asm11.fa -d asm1.MBC1.msplit.cov.txt -p asm1 -s asm1.sams.txt 2>asm1.liger.err >asm1.liger.log
all : asm1.SPolished.asm.wengan.fasta
clean :
-rm -f asm1.contigs-disco.fa asm1.contigs.disco.fa asm11.fm.sam asm1.im.1.I500.fm.sam asm1.im.1.I1000.fm.sam asm1.im.1.I2000.fm.sam asm1.MBC1.msplit.fa asm1.MBC1.msplit.cov.txt longreads.asm11.fa asm11.I500.fm.sam asm11.I1000.fm.sam asm11.I2000.fm.sam asm11.I3000.fm.sam asm11.I4000.fm.sam asm11.I5000.fm.sam asm11.I6000.fm.sam asm11.I7000.fm.sam asm11.I8000.fm.sam asm11.I10000.fm.sam asm11.I15000.fm.sam asm11.I20000.fm.sam asm1.SPolished.asm.wengan.fasta
> cat asm1.Disco_denovo.log
Performing re-exec to adjust stack size.
--------------------------------------------------------------------------------
Sat Jun 12 03:22:39 2021 run on cl4n001, pid=153941 [Nov 5 2019 06:43:49 R51885 ]
DiscovarExp \
READS="1740D-43-06_S0_L001_R1.fastq.gz,1740D-43-06_S0_L001_R2.fast \
q.gz" OUT_DIR=/tmp/asm1D NUM_THREADS=8
--------------------------------------------------------------------------------
Sat Jun 12 03:22:39 2021: Warning: recommend doing 'setenv MALLOC_PER_THREAD 1'
Sat Jun 12 03:22:39 2021: before Discovar, to improve computational performance.
INPUT FILES:
[1a,type=frag,sample=C,lib=1,frac=1] 1740D-43-06_S0_L001_R1.fastq.gz
[1b,type=frag,sample=C,lib=1,frac=1] 1740D-43-06_S0_L001_R2.fastq.gz
Sat Jun 12 06:27:13 2021: using 1,094,669,784 reads
Sat Jun 12 06:27:13 2021: data extraction complete
3.08 hours used extracting reads
Sat Jun 12 06:27:13 2021: see total physical memory of 6,222,716,518,400 bytes
Sat Jun 12 06:27:13 2021: 37.65 bytes per read base, assuming max memory available
We need 1 passes.
Expect 2582287126 keys per batch.
Provide 3204227598 keys per batch.
Have I done anything wrong?
Hi Michal,
The recommended coverage for DiscovarDenovo is 60X, if you have more coverage you need to subsample the short reads. I guess from the number that you have about 56X of short-read coverage, which is fine, but you need to set the MAX_MEM_GB to 500GB or 600GB of DiscoVarDenovo, otherwise, it will try to use the MAX memory available in the machine which seems to be 6Tb. To set the max memory you have to edit manually the makefile file generated by Wengan (prefix.mk file) and then run the pipeline with:
make -f prefix.mk all
Hope that this help,
Best Alex
Hi, I tried to assemble a 3GB allotetraploid plant genome. However, it crashed on 5.5 GB of RAM.
How is it possible to reduce the memory requirements?
Thank you in advance,
Michal