NBISweden / GenErode

GitHub repository for GenErode, a Snakemake pipeline for the analysis of whole-genome sequencing data from historical and modern samples to study patterns of genome erosion.
GNU General Public License v3.0
23 stars 7 forks source link

RepeatModeler fails on big genomes #61

Closed mariannedehasque closed 8 months ago

mariannedehasque commented 12 months ago

I am trying to run repeatmodeler on a 6.2Gb genome (a concatenated genome), but get an error message. I think it might be related to this issue: https://github.com/Dfam-consortium/RepeatModeler/issues/101. I am running GenErode version 0.4.1, but I don't think this issue has been addressed in any of the updates before.

Output of rule repeatmodeler below:

Building database GCF_024166365.1_mEleMax1.human_g1k_v37.DQ188829.2: Reading ../../GCF_024166365.1_mEleMax1.human_g1k_v37.DQ188829.2.upper.fasta... The makeblastdb program did not generate the file GCF_024166365.1_mEleMax1.human_g1k_v37.DQ188829.2.nsq. Please check your input file(s) for potential formating errors. /usr/local/bin/makeblastdb returned:

Building a new DB, current time: 10/11/2023 13:26:03 New DB name: ... New DB title: ./OSWae0eqDW Sequence type: Nucleotide Deleted existing Nucleotide BLAST database named .... Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 149 sequences in 60.8019 seconds.

verku commented 12 months ago

Hi Marianne!

Thank you for submitting this issue, I was not aware of this problem. I'll see if I can already include it into the upcoming version.

jcchacond commented 11 months ago

Hi Verena and Marianne,

I can confirm as Marianne suggested that the same error is present (for the same reference genome) in the latest version (0.5.1).

Thanks for looking into it.

verku commented 11 months ago

Testing different repeatmodeler versions revealed another issue with small genomes: repeatmodeler 1.0.11 and repeatmasker 4.0.9 were used to mask the mitogenome and only some repeats were masked, but with repeatmodeler 2.0.4 and repeatmasker 4.0.9 most of the mitogenome was masked. Add a note to the documentation.