clulab / eidos

Machine reading system for World Modelers
Apache License 2.0
37 stars 24 forks source link

Eidos should restart faster or less often #849

Open kwalcock opened 4 years ago

kwalcock commented 4 years ago

Especially during development of rules, the round trip time for incorporating the new rule is too high. One part of the solution is to use a server version of processors. That may be a good idea on its own, but there are other possibilities and they might be discussed here.

kwalcock commented 4 years ago

Below are recent startup times for Eidos. Loading all the components in parallel takes over 36 seconds. Getting processors loaded and primed occupies just under 36 seconds. Constructing a processor object takes only 15 ms of that. The rest is for "priming", which would be dealing with lazy initialization and other on-demand activities. If all the activities above the blank line in the table could be performed in parallel on 8 processors, the total of 66 seconds of work might be cut down to 8. One problem is that processors can't be split into smaller, parallelizable components by eidos. It might be done internally, though.

Component Time (ms)
NestedArgumentExpander 1
MigrationHandler 3
NegationHandler 4
HedgingHandler 8
Processors 15
AdjectiveGrounder 16
ConceptExpander 19
StopwordManager 274
ExtractorEngline 6112
EntityFinders 10009
OntologyHandler 14022
ProcessorsPrimer 35818
   
Complete parallel load 36259
EidosPrimer 6261
MihaiSurdeanu commented 4 years ago

Thanks @kwalcock ! We can definitely handle this inside processors.

kwalcock commented 4 years ago

See also #526

kwalcock commented 4 years ago

And #639

kwalcock commented 4 years ago

Startup duration and frequency (how often it needs to be done) for Eidos, particularly during development are influenced by multiple components, some at odds with each other. Here are just a few things to consider:

kwalcock commented 4 years ago

My current favorite proposed solution has two parts.

  1. Most grammar type resources are loaded with code like

    val entityRules = FileUtils.getTextFromResource(entityRulesPath)
    val entityEngine = ExtractorEngine(entityRules)

    This can be changed to

    val entityRules = Resourcer.getText(entityRulesPath)
    val entityEngine = ExtractorEngine(entityRules)

    This Resourcer will take that incoming resource path, convert it into a filename that would apply at development time and grab the text of the file if it is available. If it is not, or maybe if it's obvious we're not in development mode, then it falls back to the resource. Typically this just means that "./src/main/resources" would be prepended to the path and getTextFromResource is changed to getTextFromFile. With this change, any text updated from the IDE will be immediately available in the running program, at least a program that makes use of the reload possibility like EidosShell. I tried this out with IntelliJ by making a change to entities.yml and typing :reload and then resubmitting the previous sentence entered. In about 10 seconds the new rule was applied.

  2. If the output of the webapp is much favored to that of EidosShell, then the generation of the web page should be modularized so that other applications can call it. This shouldn't be much more than a separate output format which can be added to the present list. It is probably complicated by references to additional files for css, images, javascript, and the like, but I don't think that's a significant technical problem. It involves (re)organization. The developer would open a browser on the output file, like ../eidosshell.html, and view the result. A refresh might be required to see the output of a new query. That can be fixed by adding a meta refresh (which has some annoying side effects) or with some kind of browser extension if necessary.

BeckySharp commented 4 years ago

it is indeed complicated by javascript/css etc... but cool if that's not a deal breaker! Is this faster/as fast as if we had the processors server? The advantage of that is that we could benefit in other projects too...

On Wed, May 13, 2020 at 2:41 PM Keith Alcock notifications@github.com wrote:

External Email

My current favorite proposed solution has two parts.

  1. Most grammar type resources are loaded with code like

    val entityRules = FileUtils.getTextFromResource(entityRulesPath) val entityEngine = ExtractorEngine(entityRules)

This can be changed to

val entityRules = Resourcer.getText(entityRulesPath)
val entityEngine = ExtractorEngine(entityRules)

This Resourcer will take that incoming resource path, convert it into a filename that would apply at development time and grab the text of the file if it is available. If is is not, or maybe if it's obvious we're not in development mode, then it falls back to the resource. Typically this just means that "./src/main/resources" would be prepended to the path and getTextFromResource is changed to getTextFromFile. With this change, any text updated from the IDE will be immediately available in the running program, at least a program that makes use of the reload possibility like EidosShell. I tried this out with IntelliJ by making a change to entities.yml and typing :reload and then resubmitting the previous sentence entered. In about 10 seconds the new rule was applied.

  1. If the output of the webapp is much favored to that of EidosShell, then the generation of the web page should be modularized so that other applications can call it. This shouldn't be much more than a separate output format which can be added to the present list. It is probably complicated by references to additional files for css, images, javascript, and the like, but I don't think that's a significant technical problem. It involves (re)organization. The developer would open a browser on the output file, like ../eidosshell.html, and view the result. A refresh might be required to see the output of a new query. That can be fixed by adding a meta refresh (which has some annoying side effects) or with some kind of browser extension if necessary.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/issues/849#issuecomment-628260674, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJCPCLRRGOJMAXDBRHEU3DRRMHXJANCNFSM4MZZLC5A .

kwalcock commented 4 years ago

1 needs to be done to keep Eidos from having to restart, I think, whether or not processors is a server. If it is only rules and not code that is changing, then this reload version should be faster. If the code changes and Eidos restarts, Eidos restarts processors. (Although I think that eclipse has hot code replacement and wouldn't need to restart even Eidos.) This would be slower than the server option in which processors could keep running.

This Resourcer might eventually be involved in checking whether something in the cache needs to be updated because of an updated local resource or something at github.