HumanSignal / Adala

Adala: Autonomous DAta (Labeling) Agent framework
https://humansignal.github.io/Adala/
Apache License 2.0
967 stars 76 forks source link

Create Named Entity Recognition Skill #30

Open tianchiguaixia opened 1 year ago

tianchiguaixia commented 1 year ago

This is very important. Can we do this first?

niklub commented 1 year ago

Hi, @tianchiguaixia ! Absolutely, do you have any use case in mind to test it out?

tianchiguaixia commented 1 year ago

I have many projects that structure the results of text and image OCR.

If using BERT, it is necessary to continuously annotate data for training, which incurs a significant time cost.

  1. The fields extracted between different projects are also different, making it difficult to achieve universality.

  2. If LLM is used, according to the prompts, partial structuring can be achieved, but the accuracy is not as high as after Bert fine-tuning.

The ideal method is to use LLM+system knowledge (existing knowledge) for information extraction and NER

As knowledge (already filled into the system) increases, the model's effectiveness improves

Uploading WeChat_20231108103427.mp4…

tianchiguaixia commented 11 months ago

Hello, can you do this NER task first? Approximately when can I try it out

tianchiguaixia commented 11 months ago

Supporting NER tasks is very important, and there are many application scenarios. Can we support this first?

niklub commented 11 months ago

Hey, @tianchiguaixia , happy to implement that as soon as possible. Any ideas on how to most efficiently match entities from text using LLM?

tianchiguaixia commented 11 months ago

The entities in the text cannot be matched all at once, and the evaluation criteria are also different. We need to rely on each incoming sample and continuously let the model learn. Make the model smarter and smarter. How to evaluate the quality of learning is to evaluate the data that has been artificially corrected and the data that has been passed in.

tianchiguaixia commented 11 months ago

This is when I use space_llm and ChatGPT API to Implementation cases

https://github.com/HumanSignal/Adala/assets/29837553/7128f164-9684-4bda-aa14-9752a2700b59

tianchiguaixia commented 11 months ago

https://github.com/HumanSignal/Adala/assets/29837553/4798865d-f2fc-4203-ad45-219e645feab2

tianchiguaixia commented 11 months ago

https://github.com/HumanSignal/Adala/assets/29837553/5dece217-3ed0-4aac-a1ee-17865362e775

tianbuwei commented 10 months ago

This is when I use space_llm and ChatGPT API to Implementation cases

Rec.0004.mp4

Hello, may I ask how to input the picture information into ChatGPT? Are you using a multi-modal model to do this?

tianchiguaixia commented 10 months ago

没有啊。我直接使用的OCR+LLM

tianbuwei commented 10 months ago

没有啊。我直接使用的OCR+LLM

您好,方便我加您个微信吗,想要具体了解下您这边是怎么实现的,感觉您在页面上展示的功能太炫酷了

tianchiguaixia commented 10 months ago

tianchiguaixia2023

tianbuwei commented 10 months ago

tianchiguaixia2023

您好,您提供的是微信号啊,我这边找不到您,要不麻烦您加我一下 tianhesuo

tianbuwei commented 10 months ago

tianchiguaixia2023

您是不是设置了陌生人不允许添加您的设置

Sean-Koval commented 9 months ago

I am interested in tackling the NER skill implementation if the task is still open. I have multiple use cases at work where something like this would be valuable.

niklub commented 9 months ago

Hello guys @Sean-Koval @tianchiguaixia @tianbuwei we are going to implement a simple version of NER skill in https://github.com/HumanSignal/Adala/pull/57 Happy to get your feedback / suggestions on it!

amenhere commented 6 months ago

您好,我也在做基于llm的ner任务,我的数据是非结构化的合同文本。llm提取后一些甲乙方标签集效果不是很好,如:甲乙混淆,甲乙方提取到同一个实体等等。方便的话想和您详细讨论一下。

tianchiguaixia commented 6 months ago

提示词