Improve the `clinical_pipeline` example

mylibrar commented 2 years ago

Is your feature request related to a problem? Please describe. The current clinical_pipeline example needs to be fixed according to the latest updates in stave of how to handle remote pipeline service. stave will no longer assemble an NLP pipeline itself. Instead, stave now uses a pipeline with RemoteProcessor to call remote service.

Describe the solution you'd like Some of the steps in README.md might not work anymore. Hence we need to improve the example to make it solid:

Install extra dependencies: The command line to install forte-wrappers[elastic,nltk] might no longer work and we need to update it based on forte-wrapper repo.
Prepare elastic searcher: We may want to point out a specific version for user to download and install since the latest elasticsearch(8.1.2) is not compatible with this example.
Run Indexer: Here we should provide a download link to mimiciii/1.4/NOTEEVENTS.csv.gz. For example, https://physionet.org/content/mimiciii-demo/1.4/D_ICD_DIAGNOSES.csv.
Stave:
- We want to provide a remote pipeline service for LastUtteranceSearcher. It should be similar to this tutorial. The instruction will no longer require users to define the environment variables since they should be already configured in the service pipeline, but we still need to include the steps of starting a remote service.
- utterance_searcher.py needs to be refactored. Currently it's directly modifying the stave database, and this is not a safe operation. We should utilize backend APIs to interact with stave database. A wrapper of these APIs is provided in stave backend lib.
- The example requires user to setup two stave projects and to manually configure the database path. With the new interfaces provided by stave, we can automate these steps in a python script. Ideally users only need to run a python script, which should pop out a chatbot window for users to enter queries.

Piyush13y commented 2 years ago

This is not specifically with regards to remote processing of stave, but general working of stave with our pipeline outputs. I was just testing out stave with some json annotated output. I uploaded the base annotations first (which isn't base_ontology.json but a customized ontology json, where in I just had DrugLookupWindowAnnotation, NegationContext, UMLSConceptLink, MedicalEntityMention and MedicalArticle annotations. However, when I upload the data pack json output in the next page, I see only DrugLookupWindowAnnotation and NegationContext in the left pane/annotation legends (output does have MedicalEntityMention annotations). I am not sure why that's happening.

hunterhector commented 2 years ago

There are two places that could affect this, we know that this is not very smooth right now:

https://github.com/asyml/forte/blob/master/forte/pipeline.py#L197, currently our pipeline will use base_ontology to populate an internal ontology if not provided. So most likely it didn't take all the new entry types you defined.
The project config to Stave will further determine which types will be shown on the left pane https://github.com/asyml/forte/blob/master/forte/processors/stave/stave_processor.py#L281

As for 1, right now we are working on some new mechanism to populate the ontology, in the future, we should have all "used" entry types, but not just base_ontology. But this is not yet implemented.

As for 2, @mylibrar may know some details. I forget how we define the default configs.

hunterhector commented 2 years ago

Quick fix for 1 is to feed an ontology file to Pipeline at the begining. Note that it need to be a "full" ontology file, this can be obtained by using the "merged" option here: https://asyml-forte.readthedocs.io/en/latest/toc/ontology_generation.html#ontology-generation-steps

Piyush13y commented 2 years ago

Thanks @hunterhector. To clarify, I am currently just using an ontology json file and the output json file to work with a new Stave project. When I check the output json file, I do see new entry type annotations being detected which means pipeline did work with expected ontologies; however, when I upload the output json file to stave project, I don't see the new types in the left pane. Also, since we aren't using StaveProcessor in the pipeline directly for this example, but using json files to Stave, I feel 2nd point can't be the culprit either? Not sure if I am understanding this right

mylibrar commented 2 years ago

Thanks for the feedback. Could you provide the ontology.json and datapack.json that you used for testing? I'll try to replicate this issue on my side.

hunterhector commented 2 years ago

I do see new entry type annotations being detected which means the pipeline did work with expected ontologies;

So the output json will definitely work that the pipeline works fine, but that doesn't mean the pipeline pass all the types to stave, you need to check this variable in particular (located in pipeline resources): https://github.com/asyml/forte/blob/master/forte/processors/stave/stave_processor.py#L152

Actually, I am pretty sure that if you didn't pass your ontology through Pipeline initialization, Stave won't be able to pick it up.

Piyush13y commented 2 years ago

Thanks for the feedback. Could you provide the ontology.json and datapack.json that you used for testing? I'll try to replicate this issue on my side.

Onto_temp is the ontology json file, the other file is the output datapack onto_temp.json.zip 211230762558809616268911394647227693964.json.zip

mylibrar commented 2 years ago

I think the reason why you can only see DrugLookupWindowAnnotation and NegationContext in the left pane is that stave is skipping all the Generics entries when generating the legends. Actually this is an expected behavior because it's hard for stave to render the Generics type. So if you want stave to display all the ontology definitions, maybe you can reconsider the ontology designing and use more Annotations and Links.

Piyush13y commented 2 years ago

@mylibrar On a completely different note, I was able to start stave server without any problem until today. I then reinstalled it from source and now it throws 500s whenever I try to access my localhost. I am sure its some small thing that I am missing but I tried some stuff out and wasnt able to resolve it. Please advise. Following is the stack trace for the issue:

(base) piyush@Piyushs-Air simple-backend % stave start -o -n 8880
For more details, check out log file at /Users/piyush/.stave/log.
Starting Stave server at http://localhost:8880.
Quit the server with CONTROL-C.

[ERROR] 2022-04-26 17:58:32,394 web Uncaught exception GET / (::1)
HTTPServerRequest(protocol='http', host='localhost:8880', method='GET', uri='/', version='HTTP/1.1', remote_ip='::1')
Traceback (most recent call last):
  File "/Users/piyush/miniforge3/lib/python3.9/site-packages/tornado/web.py", line 1702, in _execute
    result = method(*self.path_args, **self.path_kwargs)
  File "/Users/piyush/casl/stave/simple-backend/stave_backend/lib/stave_viewer.py", line 214, in get
    self.render(self._build_path + "/index.html")
  File "/Users/piyush/miniforge3/lib/python3.9/site-packages/tornado/web.py", line 863, in render
    html = self.render_string(template_name, **kwargs)
  File "/Users/piyush/miniforge3/lib/python3.9/site-packages/tornado/web.py", line 1009, in render_string
    t = loader.load(template_name)
  File "/Users/piyush/miniforge3/lib/python3.9/site-packages/tornado/template.py", line 446, in load
    self.templates[name] = self._create_template(name)
  File "/Users/piyush/miniforge3/lib/python3.9/site-packages/tornado/template.py", line 477, in _create_template
    with open(path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/piyush/casl/stave/simple-backend/stave_backend/lib/build/index.html'
[ERROR] 2022-04-26 17:58:32,398 web 500 GET / (::1) 4.77ms

mylibrar commented 2 years ago

The recommended method to install stave is pip install stave==0.0.2.dev1. Right now installing stave directly from source might not work because there's actually some extra steps to build the package:

cd into stave/
You need to run yarn build to generate a bundle folder called ./build
Move the ./build folder into ./simple-backend/stave_backend/lib/
Run pip install -e ./

You can refer to the workflow here to see what steps are required for packaging.

Piyush13y commented 2 years ago

@mylibrar Thanks for that. Is it at all possible to run the current clinical_processing_pipeline.py i.e. with the present code? I followed README and I landed up on the query_chatbot.json showing up in the annotation viewer instead of the dialogue like interface. Like so

Piyush13y commented 2 years ago

@mylibrar Thanks for that. Is it at all possible to run the current clinical_processing_pipeline.py i.e. with the present code? I followed README and I landed up on the query_chatbot.json showing up in the annotation viewer instead of the dialogue like interface. Like so

Nevermind, resolved this

Piyush13y commented 2 years ago

I was trying out the following design to get the skeleton working and just thought I would share it with you. Please advise or suggest if you have anything in mind. The code is in works,

9206E1C5-0E6D-4982-BF18-E29606236CCE

mylibrar commented 2 years ago

The overall design looks good to me. One thing to note is that PL1 is already implemented by stave. Any stave project with its layout configured to DialogueBox should be able to load this pipeline. So you only need to implement PL2 and make sure it's compatible with PL1. You can also use the Eliza project in stave to test your remote service (refer to this tutorial for details).

Piyush13y commented 2 years ago

@mylibrar I realized that all string attributes in the annotations are truncated after 3 characters. Is that configurable in stave?

mylibrar commented 2 years ago

@mylibrar I realized that all string attributes in the annotations are truncated after 3 characters. Is that configurable in stave?

I guess it's not configurable now. Looks like it's hardcoded when displaying a string attribute in floating and in sidebar.

Piyush13y commented 2 years ago

@mylibrar I am done with all the other changes, and was trying to refactor utterance_searcher.py with the latest APIs. However, I see that the API calls aren't going through from the utterance_searcher.py file. On the other hand, when I use the same APIs in the clinical_processing_pipeline.py file, they work just fine. When trying to call them from utterance_searcher.py, I can see in the browser, that the API call goes on indefinitely and the page just becomes unresponsive. This happens with any call that I try, be it login, create document, etc.

The control flow gets stuck at Line 90, printing the print statement above but not below the login API. I am clueless as to what might be happening here.

mylibrar commented 2 years ago

@mylibrar I am done with all the other changes, and was trying to refactor utterance_searcher.py with the latest APIs. However, I see that the API calls aren't going through from the utterance_searcher.py file. On the other hand, when I use the same APIs in the clinical_processing_pipeline.py file, they work just fine. When trying to call them from utterance_searcher.py, I can see in the browser, that the API call goes on indefinitely and the page just becomes unresponsive. This happens with any call that I try, be it login, create document, etc.

The control flow gets stuck at Line 90, printing the print statement above but not below the login API. I am clueless as to what might be happening here.

My guess is that a deadlock occurs when the remote service invoked by Stave backend posts a request back to Stave. Django is a synchronous framework which means it can only take one request at a time. We can update the backend using some async mechanisms but it's nontrivial. Right now I guess a better solution is to directly interact with the sqlite database just as before using sqlite_insert and we can consider updating the implementation in future.

asyml / forte

Improve the `clinical_pipeline` example #745