Open manwensu opened 1 month ago
Hi,
I think there's a variety of issues going on but the main underlying one is with RepeatModeler and/or something unique within the genomes themselves. Just an initial few questions, are you running the jobs with the same number of cores and memory available for all the jobs? Are these the same genomes which are causing issues here: https://github.com/TobyBaril/EarlGrey/issues/135 ? And do you know what version of RepeatModeler is running? (RepeatModeler is the package causing the issue rather than RepeatMasker)
Assuming the version of RepeatModeler is consistent between all three runs and the number of cores and memory available is also consitent my hypothesis is by chance the seeds RepeatModeler has chosen in the two later runs for the the underlying RepeatScout and RECON packages are sampling regions of the genome containing repeats (likely satellites) which the underlying algorithm(s) struggles to create consensuses sequence for. The fact that these issues are occuring in different species of Bombus make me suspicious that there is something odd/interesting going on within these bumblebee genomes that's causing the issue. What I recommend doing as a trial is running the same version RepeatModeler by itself on the genome(s) outside of Earl Grey using the same command RepeatModeler -engine ncbi -threads ${NUM_THREADS} -database ${DATABASE}
to see if the issue occurs here too.
The issue you outline in point three above appears to be a problem with the post-processing script which calculates the Kimura distance between the repeats in genome and the consensus sequences used to find them. It appears that script crashed without completing the calculations as it should deleted those temporary files upon completion. There were some issues with it early on which we've patched now. Seeing as you still have the "filteredRepeats.gff" and the TE library you can use these scripts here to calculate the divergence and create the plots: https://github.com/jamesdgalbraith/EarlGreyDivergenceCalc (for the Rscript make sure use the --axis_flip
flag to have the plots be the same as typical EarlGrey plots)
Hi,
I think there's a variety of issues going on but the main underlying one is with RepeatModeler and/or something unique within the genomes themselves. Just an initial few questions, are you running the jobs with the same number of cores and memory available for all the jobs? Are these the same genomes which are causing issues here: #135 ? And do you know what version of RepeatModeler is running? (RepeatModeler is the package causing the issue rather than RepeatMasker)
Assuming the version of RepeatModeler is consistent between all three runs and the number of cores and memory available is also consitent my hypothesis is by chance the seeds RepeatModeler has chosen in the two later runs for the the underlying RepeatScout and RECON packages are sampling regions of the genome containing repeats (likely satellites) which the underlying algorithm(s) struggles to create consensuses sequence for. The fact that these issues are occuring in different species of Bombus make me suspicious that there is something odd/interesting going on within these bumblebee genomes that's causing the issue. What I recommend doing as a trial is running the same version RepeatModeler by itself on the genome(s) outside of Earl Grey using the same command
RepeatModeler -engine ncbi -threads ${NUM_THREADS} -database ${DATABASE}
to see if the issue occurs here too.The issue you outline in point three above appears to be a problem with the post-processing script which calculates the Kimura distance between the repeats in genome and the consensus sequences used to find them. It appears that script crashed without completing the calculations as it should deleted those temporary files upon completion. There were some issues with it early on which we've patched now. Seeing as you still have the "filteredRepeats.gff" and the TE library you can use these scripts here to calculate the divergence and create the plots: https://github.com/jamesdgalbraith/EarlGreyDivergenceCalc (for the Rscript make sure use the
--axis_flip
flag to have the plots be the same as typical EarlGrey plots)
Thank you very much! james. I will try to do that following you said.
Hi @manwensu, any updates on the RepeatModeler run? Some of the solutions suggested in #145 might work for future runs, and at least prevent the eternal elongation of strange low complexity repeats!
Hi @manwensu, any updates on the RepeatModeler run? Some of the solutions suggested in #145 might work for future runs, and at least prevent the eternal elongation of strange low complexity repeats!
Hi Tobias, I ran RepeatModeler v2.0.1 with previously failed species Bombus.dahlbomii in the singularity. It worked well and didn't have a long runtime. I am trying to run RepeatModeler v2.0.5 separately. Many thanks for your suggestions, I will try it later.
What can I do? Thank you very much!