Deliver a new AllenNLP demo (Tugboat).

jonborchardt commented 3 years ago

External Doc:

Motivation:

AllenNLP is our most popular demo, but the codebase is hard to extend. We'd like to rebuild the UI to better serve AllenNLP, but with a general approach that we could also apply to Mosaic KG, the PRIOR demos, and other collections of demos.

Success Criteria:

Sam and Jon sign off on the Tugboat Pattern used in AllenNLP demo
We port over the following demos:
- [x] Reading Comprehension
- [x] Named Entity Recognition
- [x] Open Information Extraction
- [x] Semantic Role Labeling
- [x] Dependency Parsing
- [x] Constituency Parsing
- [x] Coreference Resolution
- [x] Textual Entailment
- [x] Masked Language Modeling
- [x] Visual Question Answering
- [x] Sentiment Analysis
- [x] Language Modeling
- We've fixed some bugs as part of warranty work:
- [x] Add a <SingleModelDemo /> so that we don't end up with urls like: https://demo.allennlp.org/open-information-extraction/open-information-extraction
- We have the following higher-level capabilities in place:
- [x] We can capture client-side exceptions, so that we know when these failures occur.
- [x] We have some sort of process for reviewing and acting on these exceptions as necessary.
- The models in the previous list include model cards.
- We've deleted the legacy code that's no longer used:
- [x] Removed unused UI code
- [x] Removed Permalinks API
- [x] (We keeping GA for now) Removed Google Analytics
- Update dependencies
- [x] Dependencies updated after all js code removed
- There are some regressions that need to be addressed
- [x] Training configuration URLs are fixed per task.
- [x] Right now developing locally with a single model doesn't work with demos that use the updated UI code because the info route isn't ran locally.

Updated Estimate:

We have removed some of the tasks, so this is what is remaining for q1 There are 13 demos in total, of which 2 are very hard, 3 are hard and 7 are easy, which means we expect this to take 4-5 more weeks.

Hard
- Reading comprehension (13 d) more than half done. (3d remaining)
- Dependency Parsing (4 d)
Medium
- Coreference Resolution (3d)
- Language Modeling (3d)
- Masked language model (3d)
Easy
- The other 7, eg. VQA, Named Entity Recognition, etc… (0-2d each, 7d total)

Related to:

schmmd commented 3 years ago

Old issue: https://github.com/allenai/reviz/issues/172

codeviking commented 3 years ago

Today we ran into a bit of an issue related to the integration of model cards into the UI.

The AllenNLP platform team has expressed a desire to migrate away from model ids that don't include the task. For instance, instead of using an id like bidaf, the id will be rc-bidaf, where rc indicates the task that the model was trained for.

This change was already made in the API that returns information about model cards. This means we can't lookup the model card for a model, unless we know the id in the new format. Some models in the demo already have this, in a field called pretrained_model_id, but others (like the NMN model) do not.

I put some duct-tape in place today to unblock @jonborchardt, but neither of us think we should maintain this solution, as we'll just be doing more of this as we convert additional tasks, or as we add new, similar models to the demo in the future.

Instead we'd like to migrate the demo to use the new id format, though this introduces some unexpected work.

The existing endpoints are of the form /api/:model-id, where :model-id is of the old form (i.e. bidaf). These URLs are hard-coded throughout the existing demo code, which means it'd be hard to update them in a single pass without breaking things (as it's a lot to test!). Instead we'd like to do a more iterative migration.

We'll add a new (v1) API for producing the Kubernetes manifests for an application, that takes an additional parameter for specifying a list of alternative paths to be routed to the endpoint. This new API will use the /api/model prefix instead of /api.
```
common.v1.APIEndpoint(model.id, altPaths=['/api/bidaf'])
```
We'll port models incrementally to the new API. As we do so support will be added for the new urls (/api/model/rc-bidaf) while retaining support for the old one (/api/bidaf).
```
{
- id: 'bidaf'
+ id: 'rc-bidaf'
}
```
Since the id is changing, every deployment will create a new Kubernetes deployment. So we'll need to make sure to delete the old ones as this incremental migration occurs.
We'll port the Reading Comprehension endpoints first, after which we can modify the new UI code to use the new ID format. At this point I'll be able to remove the hack mentioned above.
We'll port the endpoints for the other tasks and models to the new mechanism.
Once all of them are ported, we can remove the old API for configuring Kubernetes things, so that only v1 exists.
Once all UI code is ported to the new mechanism, we can remove the altPaths if we want, thereby deprecating the old URL format.

As part of this I'd also like to do a little cleanup. Right now each endpoint returns some information when querying/api/:model-id. For instance, hitting /api/bidaf returns some basic information about that model. This information isn't used for anything except for the main info route.

I'd like to change this information to be returned when hitting /api/:model-id/info instead, so that /api/:model-id can be reserved for other behavior. This is important for the /tasks endpoint that I added recently, as the/tasks endpoint right now returns the info, which means the list of tasks is returned by /api/tasks/all, which is a little silly. If we make this change then /api/tasks can return all of the tasks in the demo, which is less surprising.

Also, it's worth noting that I don't plan on migrating the python packages to match the new id syntax. So api/allennlp_demo/bidaf will remain unchanged.

jonborchardt commented 3 years ago

this makes sense!

codeviking commented 3 years ago

I think we need to do this too: https://gist.github.com/codeviking/e69df59f664380276a21efec8337478e

allenai / allennlp-demo