Closed Gautam-Rajeev closed 6 months ago
@GautamR-Samagra hi, can I please get the access to the datasets. I'd like to make some contributions to this issue at hand.
HI @basedsaksham, the idea is to treat NER as a model that does multiple things under the hood :
As an input I just pass an argument which lists the entities I want to extract, the model uses either regex or the seq-seq model to extract the above entities.
hey @GautamR-Samagra I have actually written a code which is detecting the required entities such as time ,email, phone number, number and unit and also predicting the extracted time(both in hindi and english) using regex. https://colab.research.google.com/drive/1DAg0xKBYMnXXcQzFwBK2aQppoyoddj1X?authuser=1#scrollTo=6qNzUpRqjnrS this what I have done so far. I am working on integrating all of the above mentioned things in the issue
hey @GautamR-Samagra I have actually written a code which is detecting the required entities such as time ,email, phone number, number and unit and also predicting the extracted time(both in hindi and english) using regex. https://colab.research.google.com/drive/1DAg0xKBYMnXXcQzFwBK2aQppoyoddj1X?authuser=1#scrollTo=6qNzUpRqjnrS this what I have done so far. I am working on integrating all of the above mentioned things in the issue
unable to open it
hey @GautamR-Samagra I have actually written a code which is detecting the required entities such as time ,email, phone number, number and unit and also predicting the extracted time(both in hindi and english) using regex. https://colab.research.google.com/drive/1DAg0xKBYMnXXcQzFwBK2aQppoyoddj1X?authuser=1#scrollTo=6qNzUpRqjnrS this what I have done so far. I am working on integrating all of the above mentioned things in the issue
unable to open it
please try now
HI @basedsaksham, the idea is to treat NER as a model that does multiple things under the hood :
- It can be a seq-seq NER model based on this dataset. Have written some code to train such models here, can use that as a starting point.
- It could be simple regex based operation to get other entities out. e.g. email can be recognized by @ followed by domain name. I want a repo that combines all these.
As an input I just pass an argument which lists the entities I want to extract, the model uses either regex or the seq-seq model to extract the above entities.
Hey @GautamR-Samagra , worked on the NER notebook that you had given and tried to add-on crop_symptoms to it along with crop_name and crop_disease. https://colab.research.google.com/drive/1SbbM0UG18a65mrFnILMQdquLS5BB_0e5?usp=sharing This is what I have done so far. Let me know how to proceed.
@adityathenerd let me know if you were able to fix issues with it. still seeing
@adityathenerd and @basedsaksham you haven worked on separate aspects of it. @adityathenerd on the ner model using distilbert and @basedsaksham on the regex part of it.
We must integrate both parts of it into a new module ner -->agri_ner inside ai-tools.
The structure should mirror existing model setup such as that for text classification but with extra files for each kind ner we do
Folder structure can look like this :
ai-tools/
└── ner
└── agri_ner
├── Dockerfile
├── README.md
├── api.py
├── model.py
├── request.py
├── bert_nert.py
├── regex_parse_ner.py
└── lookup_ner.py
Do collaborate with each other and make a PR to ai-tools on this.
regex NER here
@adityathenerd let me know if you were able to fix issues with it. still seeing
Hey @GautamR-Samagra , found out what the problem was with this. The dataset didnt have enough pest related tags, so model was not able to predict those well. Working on adding 7-8 more pest related sentences to the dataset. it should work fine now. Will update by EOD.
Model Link Check now @GautamR-Samagra .
@Shubh-Goyal-07 can you link your PR here
The PR for the same has been made here: https://github.com/Samagra-Development/ai-tools/pull/317
@GautamR-Samagra
Goal :
We want to improve our NER model to include entitites that come out of this
The overall goal is being able to extract any relevant entity (recognize its that entity) from a question that will help us with a search.
Current state :
The model is created based on this dataset using this code.
next steps are to extract the data from the pdf/csv, create the sentences in the required format (same as the dataset above) and then train the model. 30k Queries provided in the other ticket can also be used for the same.
Some pre-decided entities are also here :
We need to have a common model that is able to detect all these entity types. We should be able to input a sentence and get back the entities detected for the sentence.