Open jingzhejiang opened 4 years ago
I would recommend not using swap at all, and turn the NVMe drives into a RAID-0 array for scratch space holding the database and files to be processed. If your databases aren't on a high speed I/O device, the blastn steps could be significantly impacted.
You can use what the installer does to make a RAID-0 array on any system with more than one local SSDs using this:
sudo mdadm --create /dev/md0 --level=0 --raid-devices=$(sudo nvme list | grep "Amazon EC2 NVMe Instance Storage" | wc -l) $(sudo nvme list | grep "Amazon EC2 NVMe Instance Storage" | cut -f1 -d " " | tr "\n" " ")
sudo mkfs.ext4 /dev/md0
sudo mount /dev/md0 /scratch
Those commands should make a RAID-0 array with whatever local SSDs you have and mount it to /scratch.
Regarding your error: it's not a memory error, you have to raise the open file limit, and I would raise the amount of threads too, just in case.
You can change the limits using ulimit:
ulimit -a
to find out your current limits
ulimit -n 65536
to set max open files to 65535
ulimit -u 4096
to set max threads to 4096
Every system is different as to the hard limits, I don't remember the hard limits on Amazon Linux 2, but I think they are higher than the previous suggestions.
There is no ability to resume from a previous run. Sorry. However, depending on what you are looking for, you could potentially skip the time intensive steps of megahit and iterative improvement. Try running with and without both/either --noAssembly --noIterImp
and see if the results are similar enough such that you can just skip one of both of those steps.
Thank you for your instruction! I just follow your no-SWAP, RAID-0 and ulimit setting For the ulimit setting, I run it like below
# add follow contents to /etc/profile
ulimit -n 1024000
ulimit -u unlimited
ulimit -s unlimited
ulimit -i 255983
ulimit -SH unlimited
ulimit -f unlimited
# add follow contents to /etc/security/limits.conf
* hard nofile 1024000
* soft nofile 1024000
* hard nproc unlimited
* soft nproc unlimited
* soft core 0
* hard core 0
* soft sigpending 255983
* hard sigpending 255983
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 255983
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
After that I use parallel -j 3 --xapply
to run virmap (c5d.24xlarge (96 vCPU, 192 GiB, 4* 800Gb) instance), but still encounter errors at 1S-WGA.combined.fa
step
the err log file
TIME 1S-WGA decompress: 140.38 seconds
TIME 1S-WGA dereplicate: 238.73 seconds
TIME 1S-WGA normalize: 90.41 seconds
TIME 1S-WGA bbmap to virus: 1484.10 seconds
TIME 1S-WGA diamond to virus: 1204.22 seconds
TIME 1S-WGA construct superscaffolds: 312.40 seconds
TIME 1S-WGA megahit assembly: 5347.56 seconds
TIME 1S-WGA dedupe assembly: 43.31 seconds
TIME 1S-WGA merge assembly: 1427.51 seconds
FAlite: Empty
lbzip2: skipping "/scratch/tmp/sEC7KktVKx/1S-WGA.diamondFilter.out": lstat(): No such file or directory
TIME 1S-WGA diamond filter map: 0.51 seconds
FAlite: Empty
TIME 1S-WGA blastn filter map: 0.09 seconds
TIME 1S-WGA filter contigs: 27.09 seconds
FAlite: Empty
TIME 1S-WGA iterative improvement: 6.56 seconds
TIME 1S-WGA self align and quantify: 4.91 seconds
FAlite: Empty
TIME 1S-WGA blastn full: 0.24 seconds
FAlite: Empty
cat: /scratch/tmp/sEC7KktVKx/1S-WGA.diamondBlastx.out: No such file or directory
FAlite: Empty
FAlite: Empty
FAlite: Empty
lbzip2: skipping "/scratch/tmp/sEC7KktVKx/1S-WGA.diamondBlastx.out": lstat(): No such file or directory
lbzip2: skipping "/scratch/tmp/sEC7KktVKx/1S-WGA.diamondBlastx.remain.out": lstat(): No such file or directory
TIME 1S-WGA diamond full: 0.28 seconds
TIME 1S-WGA determine taxonomy: 27.37 seconds
TIME 1S-WGA Overall Virmap time: 10355.72 seconds
1S-WGA.combine.err
Outer iteration 0.0 of merge
Iteration 0.0 of merge
Called merge with strict
Building a new DB, current time: 11/06/2020 09:53:03
New DB name: /dev/shm/QZwLVQgAGs/easyMergeDb
New DB title: /scratch/tmp/nzA_1WN3jC/1S-WGA.combined.fa.prepped.iterationMerge.0.0.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 251296 sequences in 14.7508 seconds.
sh: line 1: 83775 Segmentation fault easyMerge.pl /scratch/tmp/nzA_1WN3jC/1S-WGA.combined.fa.prepped.iterationMerge.0.0.fa 96 strict > /scratch/tmp/nzA_1WN3jC/1S-WGA.combined.fa.iterationMerge.0.0.fa
FAlite: Empty
END Inner iteration 0.0 of merge
inner before after merge: 125648
inner after after mege count: 0
Iteration 0.1 of merge
FAlite: Empty
FAlite: Empty
Called merge with strict
FAlite: Empty
FAlite: Empty
FAlite: Empty
END Inner iteration 0.1 of merge
inner before after merge: 0
inner after after mege count: 0
Finished inner loop after 0.1 iterations
in was /scratch/tmp/nzA_1WN3jC/1S-WGA.combined.fa.iterationMerge.0.1.fa
BACK to outer loop
FAlite: Empty
END Outer iteration 0.1 of merge
before count 125648
out after count 0
FAlite: Empty
Outer iteration 1.1 of merge
FAlite: Empty
Iteration 1.1 of merge
FAlite: Empty
FAlite: Empty
Called merge with strict
FAlite: Empty
FAlite: Empty
FAlite: Empty
END Inner iteration 1.1 of merge
inner before after merge: 0
inner after after mege count: 0
Finished inner loop after 1.1 iterations
in was /scratch/tmp/nzA_1WN3jC/1S-WGA.combined.fa.iterationMerge.1.1.fa
BACK to outer loop
FAlite: Empty
END Outer iteration 1.1 of merge
before count 0
out after count 0
Finished merging after 1 iterations
printing infile /scratch/tmp/nzA_1WN3jC/1S-WGA.combined.fa.iterationMerge.1.1.fa
FAlite: Empty
I see Segmentation fault
again. Is it because I didn't set ulimit correctly?
Thank you again!
Ginger
Hmm, this one is tricky.
Would you be able to post a compressed version of the .combined.fa somewhere? I would need to test it, since there doesn't seem to be an easy explanation for this one, easyMerge.pl itself might be crashing, but it's in perl, so that seems unlikely. It could also be something it uses, but that would need inspecting from your actual files, since some of your other samples go through without any issues right?
In the meantime, I made some changes to where I suspect easyMerge.pl is failing, you'll need to pull the repo again to get the updated version of easyMerge.pl. Hopefully the fixes work.
Hi Matt,
Thank you for your help! Yes, some of my other samples (especially for those with fewer and smaller viral contigs) go through without any issues right! The 1S-WGA.combined.fa file is empty. So I feel that the mistake should happen in the combine step. I publicly uploaded all the files and logs generated during the assembly of the 1S-WGA library to my S3 bucket (address has been sent to your email: torptube@gmail.com) I hope you can find some clues.
Ginger
It's failing to combine the pseudo-constructed assembly and the de-novo assembly. The segfault happens in easyMerge.pl, but I am not sure what is causing it.
I didn't get an email with a link to the s3 bucket.
Did you update your easyMerge.pl on your VM image? I updated it with some speculative fixes, and a little more robust error reporting to try and track down the error, can you pull it and run your sample again?
Thank you for your reply! I just sent another email via my Gmail box (jingzhejiang@gmail.com) I haven't tested the new easyMerge.pl. I'll try, and paste here if there is any update progress.
Hi Matt,
I still encounter the same errors at 1S-WGA.combined.fa step, even after I update your new easyMerge.pl. Below are the intermediate files generated before the error came out. Hope they are enough for you to analyze the problem. Thank you!
1S-WGA.combined.fa 0.0B
https://ginger-ohio.s3-accelerate.amazonaws.com/JJZ/nanhai/VirMAP/1S-WGA/1S-WGA.combined.fa.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAI6UOUQUYB2IQ6HLA%2F20201223%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20201223T014249Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=6e4f07e163e8dd552bf244569d2f17e6d884f0c56bbecf93cad22dbe6726eed8
1S-WGA.combine.err 1.5KB
https://ginger-ohio.s3-accelerate.amazonaws.com/JJZ/nanhai/VirMAP/1S-WGA/1S-WGA.combine.err?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAI6UOUQUYB2IQ6HLA%2F20201223%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20201223T014337Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=9288187a2fe9b6c738662296b8e9cab4165b43d96c81aba483f5ddc7b0d63842
1S-WGA.assembly.err.bz2 1.6KB
https://ginger-ohio.s3-accelerate.amazonaws.com/JJZ/nanhai/VirMAP/1S-WGA/1S-WGA.assembly.err.bz2?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAI6UOUQUYB2IQ6HLA%2F20201223%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20201223T014505Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=fbe2dc939b6b43cceb6a08620403bd82ef78f6d306202a9a5a455590b0e09d2b
1S-WGA.contigs.fa 34.7MB
https://ginger-ohio.s3-accelerate.amazonaws.com/JJZ/nanhai/VirMAP/1S-WGA/1S-WGA.contigs.fa.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAI6UOUQUYB2IQ6HLA%2F20201223%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20201223T014626Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=1f1c7038b7ac06083404fcb9980f5a13a3c51615ac9930c2370d8948ec7f3728
1S-WGA.superScaffolds.err.bz2 6MB
https://ginger-ohio.s3-accelerate.amazonaws.com/JJZ/nanhai/VirMAP/1S-WGA/1S-WGA.superScaffolds.err.bz2?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAI6UOUQUYB2IQ6HLA%2F20201223%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20201223T015043Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=bc156c04bdc12ae12f9610e46cba9e0504bb661710345f741d6004ea6be8f4b4
1S-WGA.superScaffolds.fa 219.4 KB
https://ginger-ohio.s3-accelerate.amazonaws.com/JJZ/nanhai/VirMAP/1S-WGA/1S-WGA.superScaffolds.fa.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAI6UOUQUYB2IQ6HLA%2F20201223%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20201223T014816Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=82cb69f6139091d99864ad2312c6bd292177fbae9892c6fccee053ae33979f71
I also sent a full list of intermediate files to your email (torptube@gmail.com) via my Gmail box (jingzhejiang@gmail.com). Thank you again!
Ginger
Hi Matt,
I know you must be busy. I just want to make sure that you got my email. In that email, I updated the S3 download path of those files. I hope you can see it. Thank you for your efforts and look forward to good results.
Ginger
Oh man, sorry, yeah, been very busy lately. I haven't had time to download and evaluate the intermediate files. Can you refresh the links?
On Tue, Dec 15, 2020, 6:59 PM jingzhejiang notifications@github.com wrote:
Hi Matt,
I know you must be busy. I just want to make sure that you got my email. In that email, I updated the S3 download path of those files. I hope you can see it. Thank you for your efforts and look forward to good results.
Ginger
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cmmr/virmap/issues/16#issuecomment-745694067, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OWCS63YOFYVATDLZK37LSVAA53ANCNFSM4TL5IOGQ .
Of course! I have refreshed above links, and these links will validate for a week. Thank you!
Oh man, sorry, yeah, been very busy lately. I haven't had time to download and evaluate the intermediate files. Can you refresh the links?
Hi Ginger,
I tried to download the links above, but it didn't work, can you refresh again?
Cheers, Matt
On Wed, Dec 16, 2020 at 12:48 AM jingzhejiang notifications@github.com wrote:
Of course! I have refreshed above links, and these links will validate for a week. Thank you!
Oh man, sorry, yeah, been very busy lately. I haven't had time to download and evaluate the intermediate files. Can you refresh the links? … <#m8035321443290211811> On Tue, Dec 15, 2020, 6:59 PM jingzhejiang @.***> wrote: Hi Matt, I know you must be busy. I just want to make sure that you got my email. In that email, I updated the S3 download path of those files. I hope you can see it. Thank you for your efforts and look forward to good results. Ginger — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#16 (comment) https://github.com/cmmr/virmap/issues/16#issuecomment-745694067>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OWCS63YOFYVATDLZK37LSVAA53ANCNFSM4TL5IOGQ .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cmmr/virmap/issues/16#issuecomment-745804199, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OWCRU55G7CKAUI6UXLWTSVBJ5RANCNFSM4TL5IOGQ .
YES, just updated. Thank you!
Hi Ginger, I tried to download the links above, but it didn't work, can you refresh again? Cheers, Matt On Wed, Dec 16, 2020 at 12:48 AM jingzhejiang notifications@github.com wrote: … Of course! I have refreshed above links, and these links will validate for a week. Thank you! Oh man, sorry, yeah, been very busy lately. I haven't had time to download and evaluate the intermediate files. Can you refresh the links? … <#m8035321443290211811> On Tue, Dec 15, 2020, 6:59 PM jingzhejiang @.***> wrote: Hi Matt, I know you must be busy. I just want to make sure that you got my email. In that email, I updated the S3 download path of those files. I hope you can see it. Thank you for your efforts and look forward to good results. Ginger — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#16 (comment) <#16 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OWCS63YOFYVATDLZK37LSVAA53ANCNFSM4TL5IOGQ . — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OWCRU55G7CKAUI6UXLWTSVBJ5RANCNFSM4TL5IOGQ .
Hi Ginger,
I tried to download from the links, but it still says expired. Can you post new links?
Cheers, Matt
On Tue, Dec 22, 2020 at 7:52 PM jingzhejiang notifications@github.com wrote:
YES, just updated. Thank you!
Hi Ginger, I tried to download the links above, but it didn't work, can you refresh again? Cheers, Matt On Wed, Dec 16, 2020 at 12:48 AM jingzhejiang notifications@github.com wrote: … <#m-6831877631870059354> Of course! I have refreshed above links, and these links will validate for a week. Thank you! Oh man, sorry, yeah, been very busy lately. I haven't had time to download and evaluate the intermediate files. Can you refresh the links? … <#m8035321443290211811> On Tue, Dec 15, 2020, 6:59 PM jingzhejiang @.***> wrote: Hi Matt, I know you must be busy. I just want to make sure that you got my email. In that email, I updated the S3 download path of those files. I hope you can see it. Thank you for your efforts and look forward to good results. Ginger — You are receiving this because you commented. Reply to this email directly, view it on GitHub <
16 https://github.com/cmmr/virmap/issues/16 (comment) <#16 (comment)
https://github.com/cmmr/virmap/issues/16#issuecomment-745694067>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OWCS63YOFYVATDLZK37LSVAA53ANCNFSM4TL5IOGQ . — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#16 (comment) https://github.com/cmmr/virmap/issues/16#issuecomment-745804199>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OWCRU55G7CKAUI6UXLWTSVBJ5RANCNFSM4TL5IOGQ .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cmmr/virmap/issues/16#issuecomment-749876041, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OWCQNZMRU5HQRT4GRODDSWFEPJANCNFSM4TL5IOGQ .
Hi Matt
I have sent them to your gmail box (torptube@gmail.com). Let me know if it doesn't work, thank you!
Ginger
Hi Matt
Hope you are doing well! Is there any progress? I've sent the data to your email (torptube@gmail.com) on Dec 24, 2020. Wish you good luck! Thank you!
Ginger
Hi Bro,
Hoping you are doing well! Have you reproduced my bug on EC2 instance? I am just waiting for the solution for my data. If it is impossible to solve, I have to turn to other assembly tools. Thank you again!
Ginger
Hi Ginger,
Sorry for the delay. Gonna work on this on EC2 today. Hopefully I will have a solution for you soon.
Cheers, Matt
OK, so I have a putative fix up, but the issue was actually running out of RAM on a c5d.9xlarge instance.
easyMerge.pl was horrible unoptimized, so I changed a few things to make it thread better and be a little more RAM efficient per thread, but given your dataset, I can't get it under 8GB/thread. So given the RAM/thread allocation on the c5's you will be underutilized threadwise, since you can only use around 20% of the threads per machine on this step.
I am going to test on the r5d.24xlarge to see how scaling works given a much higher RAM/thread allocation.
But in the meantime, if should work if you want to run it under your current VM conditions.
Wait, don't use that version.
It "succeeds" at easyMerge.pl, but there are problems with the output that I didn't see last night and it fails mergeWrapper.pl.
I'll revert for now and work on a fix.
Thank you very much! Waiting for your good news!
Hi, Matt! Are there any progresses? : P
Hello, Matt! I wish you well! Do you think it is a bug that can be solved in a short time?
Hi,
I run virmap on c5d.24xlarge (96 vCPU, 192 GiB, 4* 800Gb) instance with an 838Gb SWAP setting
It works for many libraries, but didn't for another similar size libraries (maybe with too many viral contigs ? I'm not sure.) For the error libraries, they threw out many empty files after
12S-WGA.filtered.fa
step.Here is the record of error log file
I also checked the end of file 12S-WGA.filter.err.bz2, and found error messages at the last line:
Is this a memory error, even after an 838Gb SWAP is set? Can I overcome the problem of memory error by adjusting SWAP setting, or I can only change to some expensive, big memory instance?
ANOTHER IMPORTANT QUESTION is: Can virmap continue to execute from the interrupted steps, thus saving some time and money?
Thank you for your reply!
12S-WGA.err.gz 12S-WGA.filter.err.gz