Dfam-consortium / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
226 stars 49 forks source link

Chromosome larger than one gigabase #226

Closed fmaumusINRA closed 9 months ago

fmaumusINRA commented 1 year ago

Dear RepeatMasker friends,

We are having an issue with RepeatMasker never ending the postprocess. We are working with an assembly that contains chromosomes that are above 1 Gbp and I am wondering if there could be any size limit in the code that could cause this issue.

Thanks a lot for your help, Florian

rmhubley commented 1 year ago

Wow...that's intimidating. For larger runs ( e.g full genomes or large chromosomes ) I would recommend breaking up the sequence ( say 50MB non-overlapping batches ) and running them separately through RepeatMasker. The obvious disadvantage to this is merging/adjusting of result files. I do have a Nextflow pipeline that I use for full genome runs that does this automatically. If you're familiar with Nextflow and would like to give it a try, let me know.

fmaumusINRA commented 1 year ago

Thanks a lot, Rob! Your nextflow pipeline would be wonderfully helpful! How can you share that?

rmhubley commented 1 year ago

Ok...I just made the github project for it public: https://github.com/Dfam-consortium/RepeatMasker_Nextflow Use the issue tracker on RepeatMasker_Nextflow project if you have any questions.

fmaumusINRA commented 1 year ago

Thank you so much, my PhD student should clone this today. All the best, Florian

fmaumusINRA commented 9 months ago

Thank you very much, Robert. The nextflow version allowed us running on chromosomes over 1Gbp.

For your information, in our hands, line 313 in script /RepeatMasker_Nextflow.nf: export PATH=${ucscToolsDir}/\$PATH

had to be changed to: export PATH=${ucscToolsDir}/:\$PATH

Kind regards, Florian

On Tue, 28 Nov 2023 at 02:28, Robert Hubley @.***> wrote:

Closed #226 https://github.com/rmhubley/RepeatMasker/issues/226 as completed.

— Reply to this email directly, view it on GitHub https://github.com/rmhubley/RepeatMasker/issues/226#event-11077731937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3NMZ4ZS4YSKRJ67EJFE63YGU44HAVCNFSM6AAAAAA2S5N64SVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJRGA3TONZTGE4TGNY . You are receiving this because you authored the thread.Message ID: @.***>

-- Florian Maumus | INRAE http://www.inra.fr/en - URGI http://urgi.versailles.inra.fr/ | +33 1 30 83 31 74