kwalcock commented 4 years ago

Especially during development of rules, the round trip time for incorporating the new rule is too high. One part of the solution is to use a server version of processors. That may be a good idea on its own, but there are other possibilities and they might be discussed here.

kwalcock commented 4 years ago

Below are recent startup times for Eidos. Loading all the components in parallel takes over 36 seconds. Getting processors loaded and primed occupies just under 36 seconds. Constructing a processor object takes only 15 ms of that. The rest is for "priming", which would be dealing with lazy initialization and other on-demand activities. If all the activities above the blank line in the table could be performed in parallel on 8 processors, the total of 66 seconds of work might be cut down to 8. One problem is that processors can't be split into smaller, parallelizable components by eidos. It might be done internally, though.

Component	Time (ms)
NestedArgumentExpander	1
MigrationHandler	3
NegationHandler	4
HedgingHandler	8
Processors	15
AdjectiveGrounder	16
ConceptExpander	19
StopwordManager	274
ExtractorEngline	6112
EntityFinders	10009
OntologyHandler	14022
ProcessorsPrimer	35818

Complete parallel load	36259
EidosPrimer	6261

MihaiSurdeanu commented 4 years ago

Thanks @kwalcock ! We can definitely handle this inside processors.

kwalcock commented 4 years ago

See also #526

kwalcock commented 4 years ago

And #639

kwalcock commented 4 years ago

Startup duration and frequency (how often it needs to be done) for Eidos, particularly during development are influenced by multiple components, some at odds with each other. Here are just a few things to consider:

Play runs the webapp, which provides the most useful output format we have. Play runs in different modes. We have the most experience with the development mode. Unfortunately, in this mode play monitors (re)source code for changes and will restart Eidos entirely when justified. This is too expensive. In run mode, play ignores the changes. Play is somewhat integrated into sbt and accessing it through IntelliJ is probably difficult.
sbt is a nice build tool, but it's not a development environment. It builds by constructing an entire new jar file for the application. It isn't good at incremental work.
IntelliJ does well with incremental compilation and it has a great debugger. It appears that a modified resource cannot be accessed until a program is rerun. Even though the file in the resource directory has been changed, the resource that the running program has access to is not automatically changed. The Ultimate version of IntelliJ seems to be able to work more directly with play applications: https://www.jetbrains.com/help/idea/play.html# and https://www.jetbrains.com/idea/features/editions_comparison_matrix.html.
A web browser is used to view the output of webapp. If the web browser were instead used to view a file on disk, it does not appear that the present generation of browsers automatically update their display if the file on disk changes. The user needs to refresh the view.
The debugger in IntelliJ works great on the file at hand, but if a server is involved, a remote debugging session probably needs to be used.
eidosshell might not have the best output, but I don't think there is anything preventing it from producing as output a web page instead of text.

kwalcock commented 4 years ago

My current favorite proposed solution has two parts.

Most grammar type resources are loaded with code like
```
val entityRules = FileUtils.getTextFromResource(entityRulesPath)
val entityEngine = ExtractorEngine(entityRules)
```
This can be changed to
```
val entityRules = Resourcer.getText(entityRulesPath)
val entityEngine = ExtractorEngine(entityRules)
```
This Resourcer will take that incoming resource path, convert it into a filename that would apply at development time and grab the text of the file if it is available. If it is not, or maybe if it's obvious we're not in development mode, then it falls back to the resource. Typically this just means that "./src/main/resources" would be prepended to the path and getTextFromResource is changed to getTextFromFile. With this change, any text updated from the IDE will be immediately available in the running program, at least a program that makes use of the reload possibility like EidosShell. I tried this out with IntelliJ by making a change to entities.yml and typing :reload and then resubmitting the previous sentence entered. In about 10 seconds the new rule was applied.
If the output of the webapp is much favored to that of EidosShell, then the generation of the web page should be modularized so that other applications can call it. This shouldn't be much more than a separate output format which can be added to the present list. It is probably complicated by references to additional files for css, images, javascript, and the like, but I don't think that's a significant technical problem. It involves (re)organization. The developer would open a browser on the output file, like ../eidosshell.html, and view the result. A refresh might be required to see the output of a new query. That can be fixed by adding a meta refresh (which has some annoying side effects) or with some kind of browser extension if necessary.

BeckySharp commented 4 years ago

it is indeed complicated by javascript/css etc... but cool if that's not a deal breaker! Is this faster/as fast as if we had the processors server? The advantage of that is that we could benefit in other projects too...

On Wed, May 13, 2020 at 2:41 PM Keith Alcock notifications@github.com wrote:

External Email

My current favorite proposed solution has two parts.

Most grammar type resources are loaded with code like

val entityRules = FileUtils.getTextFromResource(entityRulesPath) val entityEngine = ExtractorEngine(entityRules)

This can be changed to
val entityRules = Resourcer.getText(entityRulesPath)
val entityEngine = ExtractorEngine(entityRules)
This Resourcer will take that incoming resource path, convert it into a filename that would apply at development time and grab the text of the file if it is available. If is is not, or maybe if it's obvious we're not in development mode, then it falls back to the resource. Typically this just means that "./src/main/resources" would be prepended to the path and getTextFromResource is changed to getTextFromFile. With this change, any text updated from the IDE will be immediately available in the running program, at least a program that makes use of the reload possibility like EidosShell. I tried this out with IntelliJ by making a change to entities.yml and typing :reload and then resubmitting the previous sentence entered. In about 10 seconds the new rule was applied.

If the output of the webapp is much favored to that of EidosShell, then the generation of the web page should be modularized so that other applications can call it. This shouldn't be much more than a separate output format which can be added to the present list. It is probably complicated by references to additional files for css, images, javascript, and the like, but I don't think that's a significant technical problem. It involves (re)organization. The developer would open a browser on the output file, like ../eidosshell.html, and view the result. A refresh might be required to see the output of a new query. That can be fixed by adding a meta refresh (which has some annoying side effects) or with some kind of browser extension if necessary.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/issues/849#issuecomment-628260674, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJCPCLRRGOJMAXDBRHEU3DRRMHXJANCNFSM4MZZLC5A .

kwalcock commented 4 years ago

1 needs to be done to keep Eidos from having to restart, I think, whether or not processors is a server. If it is only rules and not code that is changing, then this reload version should be faster. If the code changes and Eidos restarts, Eidos restarts processors. (Although I think that eclipse has hot code replacement and wouldn't need to restart even Eidos.) This would be slower than the server option in which processors could keep running.

This Resourcer might eventually be involved in checking whether something in the cache needs to be updated because of an updated local resource or something at github.

clulab / eidos

Eidos should restart faster or less often #849