Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
189 stars 22 forks source link

Does -srand <number> option cause RepeatModeler to be deterministic? #203

Open alankuo1 opened 1 year ago

alankuo1 commented 1 year ago

I ran RepeatModeler 2x on a genome, using the option -srand and with the same value. However, the resulting repeat libraries differed in number of elements. I wonder if my understanding of or expectation of -srand is incorrect. My expectation is identical outputs.

The RepeatModeler implementation that I have is in a docker container. The command, that I ran 2x, is: shifter --image=docker:dfam/tetools:1.7 RepeatModeler -database Polarella_glacialis_CCMP2088 -threads 30 -srand 2756104381

The output file Polarella_glacialis_CCMP2088-families.fa has 1529 sequences in the 1st run, and 1510 sequences in the 2nd run.

rmhubley commented 1 year ago

I am not sure what version of RepeatModeler (e.g 2.0.3, 2.0.4 etc) and rmblast (e.g 2.13.0, 2.14.0) shifter is using. There was a problem with RMBlast (fixed in 2.14.0) where it could generate slightly different (but equally scoring) alignments in a multi-threaded context. When used with RepeatModeler with more than one thread (e.g -pa 10) it could generate different results even when the same seed number was used. If you upgrade to RepeatModeler 2.0.4 and RMBlast 2.14.0 this problem should go away.

alankuo1 commented 1 year ago

Hi Richard, it seems that you never got this message from Robert Hubley. Maybe you can answer Robert's questions.

Best, Alan

On Fri, Aug 4, 2023 at 4:45 PM Robert Hubley @.***> wrote:

I am not sure what version of RepeatModeler (e.g 2.0.3, 2.0.4 etc) and rmblast (e.g 2.13.0, 2.14.0) shifter is using. There was a problem with RMBlast (fixed in 2.14.0) where it could generate slightly different (but equally scoring) alignments in a multi-threaded context. When used with RepeatModeler with more than one thread (e.g -pa 10) it could generate different results even when the same seed number was used. If you upgrade to RepeatModeler 2.0.4 and RMBlast 2.14.0 this problem should go away.

— Reply to this email directly, view it on GitHub https://github.com/Dfam-consortium/RepeatModeler/issues/203#issuecomment-1666284415, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6VMK24GSA7TAVCHRS7W7S3XTWCTLANCNFSM6AAAAAAWEIHAEQ . You are receiving this because you authored the thread.Message ID: @.***>

rdhayes commented 1 year ago

Hello, our last test was with the v1.7 docker container described at https://github.com/Dfam-consortium/TETools

Alan, it appears that we should test with the more recent v1.85 release, which would upgrade rmblast from 2.13.0 to 2.14.0, according to that repo's changelog.

alankuo1 commented 1 year ago

Robert, does that make sense? Does the upgrade of rmblast have any effect on our indeterminacy issue?

Cheers, Alan

On Tue, Aug 29, 2023 at 1:08 PM Richard D. Hayes @.***> wrote:

Hello, our last test was with the v1.7 docker container described at https://github.com/Dfam-consortium/TETools

Alan, it appears that we should test with the more recent v1.85 release, which would upgrade rmblast from 2.13.0 to 2.14.0, according to that repo's changelog.

— Reply to this email directly, view it on GitHub https://github.com/Dfam-consortium/RepeatModeler/issues/203#issuecomment-1698059235, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6VMK22NUS6A4VKVZ6A4GITXXZD3NANCNFSM6AAAAAAWEIHAEQ . You are receiving this because you authored the thread.Message ID: @.***>

rdhayes commented 1 year ago

Hello, we have confirmed that our most recent tests with non deterministic results, 22724 scaffolds for 4.2 Gbases, was done with the TETools container v1.85. That changelog indicates that we used:

rmhubley commented 1 year ago

Do you have the log files from both runs? Also, if you share the sequence file I could also kick off a reproduction run on our servers to see if I can locate the issue.