Open Leolty opened 2 years ago
Except Disease and Medical, what other annotations can we add?
BioBERT: https://github.com/dmis-lab/biobert
should be based on the ner_type
we can predict, need to study the model outputs
should be based on the
ner_type
we can predict, need to study the model outputs
Yeah. Here is the problem. In the config of this example, the ner_type is specified to Disease, so all the model outputs would be Disease, if I remove this configuration and run the pipeline, all the outputs will be labelled as "BioEntity", see the default configuration here (Line 235).
I could not find any instructions on how can I change the entity type to show different kinds of types, instead of all the Entities are labelled as "BioEntity".
@hunterhector I think I detect the problem. In the following file:
I check the source code for BioBERTProcessor, and I noticed that the relationship between Line 235 and Line 228 seems that do not make sense. It just labels all the type of entities as "BioEntity", and if I change the configuration to "DISEASE", all the type of entities will then be labelled as "DISEASE", and I can change whatever I want actually.
Here I just change the configuration to "APPLE", like this: ner_type: "APPLE"
. All the entities are labelled as "APPLE".
got it. I remember I used to solve an issue to support bio ner using stanza, I will try that.
I tried stanza, and the ner_type of the outputs are as follows:
we may change the Dieases, Medical to Test, Problem and Treatment.
I tried stanza, and the ner_type of the outputs are as follows:
- TEST: oxygen saturation/ MRI of the head
- PROBLEM: an underlying restrictive ventilatory defect/ hydrocephalus/ shift of the normal midline strictures
- TREATMENT: Lexapro /sublingual nitroglycerin
we may change the Dieases, Medical to Test, Problem and Treatment.
Yeah, double check with @Piyush13y since I am sure we also have more spacy models
Yes, we have more scispacy models that we can use and they give out different kinds of NER labels.
Ref: https://allenai.github.io/scispacy/
@Leolty I feel we can't just be changing the label type for the reason that I mentioned to you guys on the call. We want the users to see what they understand in the legend and not some NLP jargon. They wouldn't know what EntityMentions/MedicalEntityMentions mean. Also, adding more attributes (ner_type) to the same annotation will still require changes to the ontology file. Might as well create new annotations for each of the NER types for a smoother demo. At least, that's what I think, specially since it might not really take a lot more time than the adjustable label type approach.
@hunterhector @Piyush13y I detected a bug here, related to Stave, I will elaborate here, which is quite tiny but stuck me for hours.
We have the json file here, like this, https://github.com/asyml/ForteHealth/blob/50_streamlit_to_stave/examples/search_engine_to_stave/default_onto_project.json
And in the code, we usually use this to create new project: session.create_project(project_json)
It can successfully create the project, but I can not open the documents in the project, it keeps loading. So I go over the .stave/db.sqlite3
, and compare the ontology
and config
in the table stave_backend_project
:
create_project()
should be modified.@hunterhector @Piyush13y I detected a bug here, related to Stave, I will elaborate here, which is quite tiny but stuck me for hours.
We have the json file here, like this, https://github.com/asyml/ForteHealth/blob/50_streamlit_to_stave/examples/search_engine_to_stave/default_onto_project.json
And in the code, we usually use this to create new project:
session.create_project(project_json)
It can successfully create the project, but I can not open the documents in the project, it keeps loading. So I go over the
.stave/db.sqlite3
, and compare theontology
andconfig
in the tablestave_backend_project
:
- I first found that, in the json file, Double Quotation Marks are used, however, in the database, they become Single Quotation Marks. ( I change it with SQL statement -- useless)
- Then, I carefully compared, found in the json file, the config uses true and false, however, when it stored in the database, it became True and False, but in json, we should use true and false. ( I change it with SQL statement -- works perfectly fine!) I think that's the point, the source code of
create_project()
should be modified.
Hi, @Leolty. Thanks for exploring this and it seems like you find an interesting bug, and I believe it is related to this function. Would you mind creating the issues on Stave to discuss the bug?
Now the fix of the bug could be simple (fixing the quotation marks and case before storing the value to the database). But I am still wondering of the reasons and the best solution:
create_project
simply sends the data via POST. IMO, the best solution would be to find out which conversion step causes this and we can find a principled solution from there. It is our last resort to post-fix the data inside the create_project
function.@hunterhector @Piyush13y I detected a bug here, related to Stave, I will elaborate here, which is quite tiny but stuck me for hours. We have the json file here, like this, https://github.com/asyml/ForteHealth/blob/50_streamlit_to_stave/examples/search_engine_to_stave/default_onto_project.json And in the code, we usually use this to create new project:
session.create_project(project_json)
It can successfully create the project, but I can not open the documents in the project, it keeps loading. So I go over the.stave/db.sqlite3
, and compare theontology
andconfig
in the tablestave_backend_project
:
- I first found that, in the json file, Double Quotation Marks are used, however, in the database, they become Single Quotation Marks. ( I change it with SQL statement -- useless)
- Then, I carefully compared, found in the json file, the config uses true and false, however, when it stored in the database, it became True and False, but in json, we should use true and false. ( I change it with SQL statement -- works perfectly fine!) I think that's the point, the source code of
create_project()
should be modified.Hi, @Leolty. Thanks for exploring this and it seems like you find an interesting bug, and I believe it is related to this function. Would you mind creating the issues on Stave to discuss the bug?
Now the fix of the bug could be simple (fixing the quotation marks and case before storing the value to the database). But I am still wondering of the reasons and the best solution:
- Double vs Single quotation, you mentioned changing this does not fix the problem, I think that's because this is only part of the problem but this should also be fixed, right?
- "True" vs "true", similar to above, JSON spec requires "true". But when does the conversion go wrong for both cases? The json file we provided seems to be correct, and
create_project
simply sends the data via POST. IMO, the best solution would be to find out which conversion step causes this and we can find a principled solution from there. It is our last resort to post-fix the data inside thecreate_project
function.
Hi, @hunterhector. After check the function you sent me, I think I have known where the bug is. As you mentioned, create_project
is correct and the json file is correct. The bug occurs when loading the json file.
In python, we usually use these functions to load a json file:
import json
file_obj = open(file_path)
project_json = json.load(file_obj)
create_project(project_json)
And I just made project_json
as the input of the function create_project
. project_json
is a Dict, which results in the Single quotation and "True".
Actually, I just need to use the dump function to solve this bug, for example:
import json
file_obj = open(file_path)
project_json = json.load(file_obj)
create_project(json.dumps(project_json))
So I think there is no need to modify the source code. We just need to make sure the parameter of the funtion create_project‘
is a string with json format (I mean Double quatation and "true" "false") instead of a Dict.
@hunterhector @Piyush13y I detected a bug here, related to Stave, I will elaborate here, which is quite tiny but stuck me for hours. We have the json file here, like this, https://github.com/asyml/ForteHealth/blob/50_streamlit_to_stave/examples/search_engine_to_stave/default_onto_project.json And in the code, we usually use this to create new project:
session.create_project(project_json)
It can successfully create the project, but I can not open the documents in the project, it keeps loading. So I go over the.stave/db.sqlite3
, and compare theontology
andconfig
in the tablestave_backend_project
:
- I first found that, in the json file, Double Quotation Marks are used, however, in the database, they become Single Quotation Marks. ( I change it with SQL statement -- useless)
- Then, I carefully compared, found in the json file, the config uses true and false, however, when it stored in the database, it became True and False, but in json, we should use true and false. ( I change it with SQL statement -- works perfectly fine!) I think that's the point, the source code of
create_project()
should be modified.Hi, @Leolty. Thanks for exploring this and it seems like you find an interesting bug, and I believe it is related to this function. Would you mind creating the issues on Stave to discuss the bug? Now the fix of the bug could be simple (fixing the quotation marks and case before storing the value to the database). But I am still wondering of the reasons and the best solution:
- Double vs Single quotation, you mentioned changing this does not fix the problem, I think that's because this is only part of the problem but this should also be fixed, right?
- "True" vs "true", similar to above, JSON spec requires "true". But when does the conversion go wrong for both cases? The json file we provided seems to be correct, and
create_project
simply sends the data via POST. IMO, the best solution would be to find out which conversion step causes this and we can find a principled solution from there. It is our last resort to post-fix the data inside thecreate_project
function.Hi, @hunterhector. After check the function you sent me, I think I have known where the bug is. As you mentioned,
create_project
is correct and the json file is correct. The bug occurs when loading the json file.In python, we usually use these functions to load a json file:
import json file_obj = open(file_path) project_json = json.load(file_obj) create_project(project_json)
And I just made
project_json
as the input of the functioncreate_project
.project_json
is a Dict, which results in the Single quotation and "True".Actually, I just need to use the dump function to solve this bug, for example:
import json file_obj = open(file_path) project_json = json.load(file_obj) create_project(json.dumps(project_json))
So I think there is no need to modify the source code. We just need to make sure the parameter of the funtion
create_project‘
is a string with json format (I mean Double quatation and "true" "false") instead of a Dict.
Sounds good, thanks!
As mentioned in the meeting.