galaxyproject / usegalaxy-tools

usegalaxy.* common tools
12 stars 54 forks source link

Request: increase memory allocation for repeatmasker_wrapper #839

Open fubar2 opened 1 month ago

fubar2 commented 1 month ago

Currently, repeatmasker_wrapper has

Cores Allocated  16
Memory Allocated (MB) 59392

A single chromosome works but a whole VGP haplotype fails OOM. Currently trying to get a RAM graph from running the same job but will take a while.

mvdbeek commented 1 month ago

It might be quite reasonable to split by chromsomes, should be able to do this: https://usegalaxy.org/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fbgruening%2Fsplit_file_to_collection%2Fsplit_file_to_collection%2F0.5.2&version=latest That'll both be faster and more memory efficient.

fubar2 commented 1 month ago

That's an interesting option to pursue @mvdbeek. Thanks! Fasta contigs can be concatenated, but joining dozens of GFF with headers will probably need a new tool so probably not practicable for me - but if someone wants to take care of thatprepare a demonstration, it could be a solution.

Since it works fine on EU and there are other things to do, I'll remove it from the workflow for now, until that's done.

fubar2 commented 1 month ago

@mvdbeek: Here's why that job failed OOM with 59GB - run on a local Galaxy with 12 cores so < 1 GB RAM for most of the run, but right at the end RAM seems to blow out - max at the end ~63 or so GB - just a few more would probably work on .org

image

fubar2 commented 1 month ago

@mvdbeek: TreeValGal ignores the fasta output you might be assuming and only uses the GFF3.

A test at the GMOD gff3 tester shows that concatenating 2 or more GFF3, each with correct headers, will create an invalid GFF3. The message explains that it can be fixed and correctly ordered with one of their tools. If someone wants to wrap that new tool, it could be a solution. Sounds like more work than getting the allocation right.

mvdbeek commented 1 month ago

Do you have maybe the top 100 lines of 2 valid GFF files ? Nothing I find on the web actually validates against https://genometools.org/cgi-bin/gff3validator.cgi. https://usegalaxy.org/u/marius/w/merge-gff3 probably works, but hard to test if nothing actually validates. And the one file I fixed up manually complains about overlapping ids when I duplicate it 😆

mvdbeek commented 1 month ago

Ugh, this was hard, but finally I got 2 input files that actually validate. Here's an example run https://usegalaxy.org/workflows/invocations/84e15596bd4fc608?from_panel=true

fubar2 commented 1 month ago

@mvdbeek: Thanks! Will give that a try tomorrow.

fubar2 commented 1 month ago

@mvdbeek: More and more layers - it's not that simple of course. Ignoring the gff fixer for a moment for simplicity, a contig split repeatmasker test with a 500MB fish fasta fails red on usegalaxy.org.

natefoo commented 1 month ago

I can increase this of course but I'm very confused since afaict EU allocates only 40 GB (it is in their local tools.yml but it doesn't look like they override memory).

@fubar do you have a run on EU you can check the memory allocation/usage of?

natefoo commented 1 month ago

Ah I forgot about their automatic resubmission.

natefoo commented 1 month ago

Bumped to 76GB.

fubar2 commented 1 month ago

For efficiency, @mvdbeek's solution for getting a valid GFF after splitting into contigs could be very helpful. Now that it seems to have enough RAM, the WF starts and some parts run, but it does not end well. Repeatmasker is a very unruly tool but not sure how much more effort it deserves - unless this stress test provides a useful edge case for workflow job submission?

fubar2 commented 1 month ago

@natefoo: Sadly https://vgp.usegalaxy.org/datasets/f9cad7b01a4721353343582b8c4d1cc2/preview job ended green but with empty outputs ~28 hours after starting with mongo RAM allocation. See @mvdbeek's sensible map reduce suggestion and the conclusion of an attempt at implementing it above.

No need for more effort trying to tame this unruly tool for VGP scale operation. TreeValGal still has a windowmasker model free repeat density bigwig - so not crucial.

OTOH: If repeatmasker's dodgy code is effectively and properly isolated as a tool, maybe the failing workflow here is useful as an edge case for testing extremely resource hungry hammering during workflow invocation over a collection.