alekseyzimin / masurca

GNU General Public License v3.0
236 stars 35 forks source link

SLURM submit error #67

Open stubrownGal opened 5 years ago

stubrownGal commented 5 years ago

I am trying to run Masurca 3.2.8 on our new SLURM based cluster. I am assembling 40 Gb of Nanopore reads plus Illumina PE and Jump libraries. Masurca ran for a few hours and completed the read correction phase, but failed when trying to submit mega-read pass-1 jobs to the grid.

The STD_error message is below. I noticed that there is no specific grid information in the assemble.sh script, but it calls another script masurca/3.2.8/bin//mega_reads_assemble_cluster.sh which does contain grid commands that are wrong for our system (it is set up for SGE grid, not SLURM - see second section below). I did include some SLURM information in the config.txt.

How do I correctly set up Masurca for our SLURM based cluster?

[browns02@bigpurple-ln1 Masurca]$ ./assemble.sh [Tue Sep 25 15:44:38 EDT 2018] Processing pe library reads [Tue Sep 25 15:52:49 EDT 2018] Processing sj library reads [Tue Sep 25 15:59:39 EDT 2018] Average PE read length 101 [Tue Sep 25 15:59:40 EDT 2018] Using kmer size of 67 for the graph [Tue Sep 25 15:59:40 EDT 2018] MIN_Q_CHAR: 33 [Tue Sep 25 15:59:40 EDT 2018] Creating mer database for Quorum [Tue Sep 25 16:22:05 EDT 2018] Error correct PE [Tue Sep 25 16:48:36 EDT 2018] Error correct JUMP [Tue Sep 25 17:07:41 EDT 2018] Estimating genome size [Tue Sep 25 17:21:07 EDT 2018] Estimated genome size: 2360747251 [Tue Sep 25 17:21:07 EDT 2018] Creating k-unitigs with k=67 [Tue Sep 25 17:56:30 EDT 2018] Creating k-unitigs with k=31 [Tue Sep 25 18:47:29 EDT 2018] Filtering mate pairs Assuming outtie orientation Assuming outtie orientation Chimeric/Redundant jump reads: 52378114 chimeric_sj.txt 38148468 redundant_sj.txt 90526582 total [Tue Sep 25 23:57:14 EDT 2018] Creating FRG files [Wed Sep 26 00:20:47 EDT 2018] Computing super reads from PE [Wed Sep 26 01:43:09 EDT 2018] Using CABOG from /gpfs/share/apps/masurca/3.2.8/bin/../CA8/Linux-amd64/bin [Wed Sep 26 01:43:09 EDT 2018] Running mega-reads correction/assembly [Wed Sep 26 01:43:09 EDT 2018] Using mer size 15 for mapping, B=15, d=0.02 [Wed Sep 26 01:43:09 EDT 2018] Estimated Genome Size 2360747251 [Wed Sep 26 01:43:09 EDT 2018] Estimated Ploidy 1 [Wed Sep 26 01:43:09 EDT 2018] Using 32 threads [Wed Sep 26 01:43:09 EDT 2018] Output prefix mr.41.15.15.0.02 [Wed Sep 26 01:43:09 EDT 2018] Using 25x of the longest ONT reads [Wed Sep 26 01:46:29 EDT 2018] Reducing super-read k-mer size [Wed Sep 26 02:21:04 EDT 2018] Mega-reads pass 1 [Wed Sep 26 02:21:04 EDT 2018] Running on the grid in 89 batches [Wed Sep 26 02:29:01 EDT 2018] submitting SGE create_mega_reads jobs to the grid [Wed Sep 26 02:29:01 EDT 2018] create_mega_reads failed on the grid [Wed Sep 26 02:29:01 EDT 2018] mega-reads pass 1 on the grid failed or stopped, please re-run assemble.sh [Wed Sep 26 02:29:01 EDT 2018] Assembly stopped or failed, see CA.mr.41.15.15.0.02.log

this is the batch size for grid execution

PBATCH_SIZE=2000000000 GRID_ENGINE="SGE" QUEUE="" USE_SGE=0 PACBIO="" NANOPORE="" ONEPASS=0 GC= RC= NC= if tty -s < /dev/fd/1 2> /dev/null; then GC='\e[0;32m' RC='\e[0;31m' NC='\e[0m' fi

zrlewis commented 5 years ago

@stubrownGal Any luck getting this running on SLURM? I just ran into a similar problem.

stubrownGal commented 5 years ago

I have made some more progress, but I cant say exactly what was changed. Basically there are bits of SGE specific code in many places. I have tracked some back and got it to run through the SuperReads, but it dies on the megaReads.

I found this code for “create_mega_reads.sh” within my last output folder created which was mr_pass1 It is full of SGE specific stuff that will not work on my SURM system.

[browns02@bigpurple-ln4 mr_pass1]$ more create_mega_reads.sh

!/bin/sh

if [ ! -e mr.batch$SGE_TASK_ID.success ];then /gpfs/share/apps/masurca/3.2.8/bin/create_mega_reads -s 5967832386 -m 15 --psa-min 12 --stretch-cap 10000 -k 41 -u ../g uillaumeKUnitigsAtLeast32bases_all.41.fasta -t 32 -B 15 --max-count 5000 -d 0.02 -r ../superReadSequences.named.fasta -p lr.batch$SGE_TASK_ID -o mr.batch$SGE_TASK_ID.tmp && mv mr.batch$SGE_TASK_ID.tmp mr.batch$SGE_TASK_ID.txt && touch mr. batch$SGE_TASK_ID.success else echo "job $SGE_TASK_ID previously completed successfully" fi

From: Zack Lewis [mailto:notifications@github.com] Sent: Monday, October 22, 2018 2:07 PM To: alekseyzimin/masurca Cc: Brown, Stuart; Mention Subject: Re: [alekseyzimin/masurca] SLURM submit error (#67)

@stubrownGalhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_stubrownGal&d=DwMCaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=gDmM_501-dDAuSo4nZTF17Qgt3MzRMmpU99zkXegrJU&m=9GWzvgjTpILOgutWQxC4lOcopeNpcbH1CXYHcv_N_6U&s=fu0ZNK2Dx79CGYmL1SHinDLJGofWcroDnp_zx0H3ZDM&e= Any luck getting this running on SLURM? I just ran into a similar problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_alekseyzimin_masurca_issues_67-23issuecomment-2D431918627&d=DwMCaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=gDmM_501-dDAuSo4nZTF17Qgt3MzRMmpU99zkXegrJU&m=9GWzvgjTpILOgutWQxC4lOcopeNpcbH1CXYHcv_N_6U&s=5MWZpCOvKHMKujanVvMwu-XpwOoE7ux2AhkUONvLo1w&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOo6WqqPAq0H4EnciXbsnDpR7tbowY4jks5ungk-2DgaJpZM4W8sW2&d=DwMCaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=gDmM_501-dDAuSo4nZTF17Qgt3MzRMmpU99zkXegrJU&m=9GWzvgjTpILOgutWQxC4lOcopeNpcbH1CXYHcv_N_6U&s=cBzArC-scgJmP0dtVqyinJlhzvE0Y577wJx_Nuc1qo0&e=.


This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.

zrlewis commented 5 years ago

@stubrownGal Okay. It looks like SLURM support might not be fully integrated yet, judging from the mega_reads_assemble_cluster.sh script. I'll try running it locally, setting USE_GRID=0

zrlewis commented 5 years ago

@alekseyzimin Any status update for SLURM support in v 3.2.9? Running mega-reads locally on a large genome is very slow in my case.

sunnycqcn commented 5 years ago

Yes. I take about two months with 120 cpus.

On Tue, Nov 20, 2018 at 8:36 AM Zack Lewis notifications@github.com wrote:

@alekseyzimin https://github.com/alekseyzimin Any status update for SLURM support in v 3.2.9? Running mega-reads locally on a large genome is very slow in my case.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/67#issuecomment-440294894, or mute the thread https://github.com/notifications/unsubscribe-auth/AXaRKOk9n3g9pjaYiqtbmYHpdbYfJvryks5uxBOKgaJpZM4W8sW2 .

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

alekseyzimin commented 5 years ago

Still working on SLURM support. Currently only SGE is supported. MaSuRCA has two parts: correction and assembly. Assembly runs on SLURM already, correction does not yet.

On Tue, Nov 20, 2018 at 9:49 AM sunnycqcn notifications@github.com wrote:

Yes. I take about two months with 120 cpus.

On Tue, Nov 20, 2018 at 8:36 AM Zack Lewis notifications@github.com wrote:

@alekseyzimin https://github.com/alekseyzimin Any status update for SLURM support in v 3.2.9? Running mega-reads locally on a large genome is very slow in my case.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/alekseyzimin/masurca/issues/67#issuecomment-440294894>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AXaRKOk9n3g9pjaYiqtbmYHpdbYfJvryks5uxBOKgaJpZM4W8sW2

.

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/67#issuecomment-440299232, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHTp0CTNXdPlr72AKARiYKNp8FPJgks5uxBaDgaJpZM4W8sW2 .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

stubrownGal commented 5 years ago

Thanks for the update.

I finally got it to run to completion without grid support, using 32 threads on our large memory machine.

Stuart M. Brown, Ph.D. Center for Health Informatics and Bioinformatics New York University School of Medicine


From: Aleksey Zimin notifications@github.com Sent: Tuesday, November 20, 2018 11:01:16 AM To: alekseyzimin/masurca Cc: Brown, Stuart; Mention Subject: Re: [alekseyzimin/masurca] SLURM submit error (#67)

Still working on SLURM support. Currently only SGE is supported. MaSuRCA has two parts: correction and assembly. Assembly runs on SLURM already, correction does not yet.

On Tue, Nov 20, 2018 at 9:49 AM sunnycqcn notifications@github.com wrote:

Yes. I take about two months with 120 cpus.

On Tue, Nov 20, 2018 at 8:36 AM Zack Lewis notifications@github.com wrote:

@alekseyzimin https://github.com/alekseyzimin Any status update for SLURM support in v 3.2.9? Running mega-reads locally on a large genome is very slow in my case.

- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/alekseyzimin/masurca/issues/67#issuecomment-440294894>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AXaRKOk9n3g9pjaYiqtbmYHpdbYfJvryks5uxBOKgaJpZM4W8sW2

.

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/67#issuecomment-440299232, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHTp0CTNXdPlr72AKARiYKNp8FPJgks5uxBaDgaJpZM4W8sW2 .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_alekseyzimin_masurca_issues_67-23issuecomment-2D440326097&d=DwMFaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=gDmM_501-dDAuSo4nZTF17Qgt3MzRMmpU99zkXegrJU&m=OQNaXTBzrt-DW4bAKieRy_cO_DbtRTIylMaVg4gm90w&s=cA8hUiCJd1eVNI3moZ34cPqnKj0Kb5eDGH9bqOKeT0M&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOo6Wr5wkWTkUU-5Fgk1-5FMf4pywHcAnxFQks5uxCdMgaJpZM4W8sW2&d=DwMFaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=gDmM_501-dDAuSo4nZTF17Qgt3MzRMmpU99zkXegrJU&m=OQNaXTBzrt-DW4bAKieRy_cO_DbtRTIylMaVg4gm90w&s=a04lrNPqiIbfYsxU8SI1Gk_MRM1QOG0OFGwJ7sCXpNY&e=.


This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.

alekseyzimin commented 5 years ago

Great the hear that!

--Aleksey

On Thu, Nov 22, 2018 at 8:37 AM stubrownGal notifications@github.com wrote:

Thanks for the update.

I finally got it to run to completion without grid support, using 32 threads on our large memory machine.

Stuart M. Brown, Ph.D. Center for Health Informatics and Bioinformatics New York University School of Medicine


From: Aleksey Zimin notifications@github.com Sent: Tuesday, November 20, 2018 11:01:16 AM To: alekseyzimin/masurca Cc: Brown, Stuart; Mention Subject: Re: [alekseyzimin/masurca] SLURM submit error (#67)

Still working on SLURM support. Currently only SGE is supported. MaSuRCA has two parts: correction and assembly. Assembly runs on SLURM already, correction does not yet.

On Tue, Nov 20, 2018 at 9:49 AM sunnycqcn notifications@github.com wrote:

Yes. I take about two months with 120 cpus.

On Tue, Nov 20, 2018 at 8:36 AM Zack Lewis notifications@github.com wrote:

@alekseyzimin https://github.com/alekseyzimin Any status update for SLURM support in v 3.2.9? Running mega-reads locally on a large genome is very slow in my case.

- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/alekseyzimin/masurca/issues/67#issuecomment-440294894 , or mute the thread <

https://github.com/notifications/unsubscribe-auth/AXaRKOk9n3g9pjaYiqtbmYHpdbYfJvryks5uxBOKgaJpZM4W8sW2

.

-- Fuyou Fu, Ph.D. Department of Botany and Plant Pathology Purdue University USA

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/alekseyzimin/masurca/issues/67#issuecomment-440299232>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AZ9zHTp0CTNXdPlr72AKARiYKNp8FPJgks5uxBaDgaJpZM4W8sW2

.

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_alekseyzimin_masurca_issues_67-23issuecomment-2D440326097&d=DwMFaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=gDmM_501-dDAuSo4nZTF17Qgt3MzRMmpU99zkXegrJU&m=OQNaXTBzrt-DW4bAKieRy_cO_DbtRTIylMaVg4gm90w&s=cA8hUiCJd1eVNI3moZ34cPqnKj0Kb5eDGH9bqOKeT0M&e=>, or mute the thread< https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOo6Wr5wkWTkUU-5Fgk1-5FMf4pywHcAnxFQks5uxCdMgaJpZM4W8sW2&d=DwMFaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=gDmM_501-dDAuSo4nZTF17Qgt3MzRMmpU99zkXegrJU&m=OQNaXTBzrt-DW4bAKieRy_cO_DbtRTIylMaVg4gm90w&s=a04lrNPqiIbfYsxU8SI1Gk_MRM1QOG0OFGwJ7sCXpNY&e=

.


This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/67#issuecomment-441033407, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHaMqAvilwK1fpc4J46jR_g-V8AYjks5uxqitgaJpZM4W8sW2 .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

zrlewis commented 5 years ago

@alekseyzimin Thanks for the update. Keep us posted on SLURM support. I'm limited to nodes with 20 CPUs, so I wouldn't be able to take the approach that @sunnycqcn did, unfortunately.

alekseyzimin commented 5 years ago

Hi,

I have a beta version with SLURM support. The way it works is that you run the main script on a single multi threaded high memory machine and when it is time for the expensive steps, such as create_mega_reads or overlapper in assembly, the code prepares the batch jobs, prints out the command to submit (sbatch.....) and exits. Then you can submit the jobs and once they are all done, re-run assemble.sh. If a few jobs failed it will stop again asking to re-submit jobs. Successful jobs will not have to be re-run.

I am working on a better way to implement this by setting up dependencies.

You can get the 3.3.0b version with SLURM support here:

https://github.com/alekseyzimin/masurca

Best, Aleksey

On Tue, Nov 27, 2018 at 1:16 PM Zack Lewis notifications@github.com wrote:

@alekseyzimin https://github.com/alekseyzimin Thanks for the update. Keep us posted on SLURM support. I'm limited to nodes with 20 CPUs, so I wouldn't be able to take the approach that @sunnycqcn https://github.com/sunnycqcn did, unfortunately.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/67#issuecomment-442161975, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHbp7W65rIIQE5VxLN-beuVPCTIesks5uzYFjgaJpZM4W8sW2 .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com

zrlewis commented 5 years ago

@alekseyzimin Fantastic! I look forward to trying it out.

If I have already started a run, which has been having trouble getting past mega-reads, then do I need to restart or can I resume with the new version?

alekseyzimin commented 5 years ago

Hi,

You can simply restart it with the new version.

--Aleksey

On Thu, Nov 29, 2018 at 1:01 PM Zack Lewis notifications@github.com wrote:

@alekseyzimin https://github.com/alekseyzimin Fantastic! I look forward to trying it out.

If I have already started a run, which has been having trouble getting past mega-reads, then do I need to restart or can I resume with the new version?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/67#issuecomment-442932986, or mute the thread https://github.com/notifications/unsubscribe-auth/AZ9zHebCprnD7ABj71OiBl2FwsBrSvSxks5u0CEWgaJpZM4W8sW2 .

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 http://www.genome.umd.edu http://masurca.blogspot.com