Closed leedchou closed 1 year ago
You may randomly split that one normal bam file into two, and designate one of them as the "tumor." You cannot use the same bam file, because then there will be no false positive in the training data. False positives come when you have different reads for the same sample. But if you have identical reads (like using one single bam file), then there will be no false positive.
You may randomly split that one normal bam file into two, and designate one of them as the "tumor." You cannot use the same bam file, because then there will be no false positive in the training data. False positives come when you have different reads for the same sample. But if you have identical reads (like using one single bam file), then there will be no false positive.
Much appreciation. Here's another issue I met today:
I failed pulling Docker image when trying to run _BamSimulatorsingleThread.sh. I guess it might be the security mechanism of my HPC stopped this process. Will the alternate solution that I download your docker images locally on other devices and then run simulating script work? If it will be working, is there any path that I can download your docker images?
Thanks again.
You should be able to download docker image in another device, save the image as a .tar file, copy that .tar file to your HPC drive, and then unpack that .tar drive.
Another alternative may be to build the docker image using the docker file at https://github.com/bioinform/somaticseq/tree/master/Dockerfiles (for bamsurgeon: https://github.com/litaifang/bamsurgeon/blob/master/Dockerfiles/bamsurgeon-1.1-3.dockerfile), tag it like the image you would be using, and then your system will use that local image instead.
Yeah Bamsurgeon can take a very long time. Why not use the multi-thread option?
http://www.chem.ucla.edu/~ltfang
On Thu, Apr 28, 2022 at 7:22 PM leed @.***> wrote:
You should be able to download docker image in another device, save the image as a .tar file, copy that .tar file to your HPC drive, and then unpack that .tar drive.
Another alternative may be to build the docker image using the docker file at https://github.com/bioinform/somaticseq/tree/master/Dockerfiles (for bamsurgeon: https://github.com/litaifang/bamsurgeon/blob/master/Dockerfiles/bamsurgeon-1.1-3.dockerfile), tag it like the image you would be using, and then your system will use that local image instead.
Thanks for your answer, I will give a try later. Before that, I did run BamSimulator successfully yesterday with the action bash. It is still running while 15 hours have passed, I wondered how much time will it spend to finish this process on a WGS data (60X, single thread)?
Best regards.
— Reply to this email directly, view it on GitHub https://github.com/bioinform/somaticseq/issues/113#issuecomment-1112820249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB44HRUMWW7L3YTS5646RP3VHNBVVANCNFSM5UOJCHGA . You are receiving this because you were mentioned.Message ID: @.***>
Yeah Bamsurgeon can take a very long time. Why not use the multi-thread option? … -- Li Tai -------------------------------------------- http://www.chem.ucla.edu/~ltfang On Thu, Apr 28, 2022 at 7:22 PM leed @.> wrote: You should be able to download docker image in another device, save the image as a .tar file, copy that .tar file to your HPC drive, and then unpack that .tar drive. Another alternative may be to build the docker image using the docker file at https://github.com/bioinform/somaticseq/tree/master/Dockerfiles (for bamsurgeon: https://github.com/litaifang/bamsurgeon/blob/master/Dockerfiles/bamsurgeon-1.1-3.dockerfile), tag it like the image you would be using, and then your system will use that local image instead. Thanks for your answer, I will give a try later. Before that, I did run BamSimulator successfully yesterday with the action bash. It is still running while 15 hours have passed, I wondered how much time will it spend to finish this process on a WGS data (60X, single thread)? Best regards. — Reply to this email directly, view it on GitHub <#113 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB44HRUMWW7L3YTS5646RP3VHNBVVANCNFSM5UOJCHGA . You are receiving this because you were mentioned.Message ID: @.>
Sorry, just returned from vacation. I've tried multi-thread option by running the following script.
HOME_PATH=/variantcall/USER/chou
REF=/source/ref/hs37d5/hs37d5.fa
REPLICATE_001=/data/1/HG001.bam
REPLICATE_002=/data/2/HG001.bam
$HOME_PATH/somaticseq/somaticseq/utilities/singularities/bamSimulator/BamSimulator_multiThreads.sh \
--genome-reference $REF \
--tumor-bam-in $REPLICATE_001 \
--normal-bam-in $REPLICATE_002 \
--tumor-bam-out syntheticTumor.bam \
--normal-bam-out syntheticNormal.bam \
--split-proportion 0.5 \
--threads 8 \
--num-snvs 20000 \
--num-indels 8000 \
--min-vaf 0.0 \
--max-vaf 1.0 \
--left-beta 2 \
--right-beta 5 \
--min-variant-reads 2 \
--output-dir $HOME_PATH/TN_data/simulated/HG001 \
--action qsub
However, It did not work. When I was checking qstat, found that all these 8 queues just waited in line. I am sure these tasks were submitted to the same node on cluster, because I got only one node to use. I wondered if it is related to the number of node or CPU.
When run in parallel, each "region" requires its own resources. Maybe your one node only has enough memory for one single task at a time?
Yes, it seems like there is no difference in terms of runtime between parallel mode and non parallel mode in my case. I modified the scripts you posted, and it's been working in parallel with GNU parallel.
Thanks.
Hi, @litaifang
You mentioned an ideal simulated example is from sequencing replicates of the same sample. However, in my case, I got only one normal BAM for each sample. Can I use that only normal BAM as both tumor-bam-in and normal-bam-in directly to get synthetic data? Would it make an influence to calling precision in a bad way if I training model with these simulated samples rather than ideal examples?
Best regards, Chou