argilla-io / argilla

Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
https://docs.argilla.io/en/latest/
3.63k stars 340 forks source link

[DOCS] Create a tutorial about using SpanMarker and Argilla #4086

Open sdiazlor opened 8 months ago

sdiazlor commented 8 months ago

Which page or section is this issue related to?

To create a tutorial about how to use SpanMaker and Argilla for NER

davidberenstein1957 commented 7 months ago

https://huggingface.co/datasets?task_ids=task_ids%3Anamed-entity-recognition

davidberenstein1957 commented 7 months ago

https://docs.argilla.io/en/latest/tutorials/notebooks/ner_fine_tune_bert_beginners.html

davidberenstein1957 commented 7 months ago

https://www.numind.ai/blog/a-foundation-model-for-entity-recognition

davidberenstein1957 commented 7 months ago

https://huggingface.co/numind/generic-entity_recognition_NER-v1

Rami-Ismael commented 6 months ago

Hi there! I'm Rami Ismael, the individual behind the GitHub issues initiative as discussed here. I'm currently enjoying my winter break and have some free time on my hands. I'm keen on offering my assistance to help finalize the documentation. Would that be possible?

nataliaElv commented 5 months ago

Perhaps this tutorial will be more useful when we release the Spans Question for Feedback Datasets? Otherwise it will be outdated quite soon.

ceteri commented 4 months ago

We're building this https://github.com/DerwenAI/textgraphs which leverages SpanMarker and other LLM-based tasks in KG construction ... and if you notice the "report" this project has a very large Argilla-shaped puzzle piece missing in its center (why we needed the gradients for extracted entity and relation streams). I'd like to offer help on the SpanMarker + Argilla tutorial too.

louisguitton commented 4 months ago

I'd also like to offer help on this tutorial, whether on designing it, writing it or maintaining it.

My notes on writing a tutorial:

Proposed promise for this Argilla + SpanMarker tutorial

if you have the basic knowledge required to follow this tutorial (e.g. spaCy?), and you follow its directions, you will end up with a working Argilla Server, complete with a FeedbackDataset with Span Categorization Questions, with NER label Suggestions machine-generated by SpanMarker, ready for Annotators to add Responses. Advanced readers will be able to add Metadata or Vectors.

What do you think? what would be a good amount of knowledge required?

And we are waiting for the Span Categorization to be released in the FeedbackDataset right? or did I miss this going live?

nataliaElv commented 4 months ago

hi @louisguitton ! Yes, we're working on releasing a Spans question for Feedback datasets and once that's out, we can start working on the tutorial. I think it would be highly beneficial for the adoption of this feature that the tutorial is published soon after the release.

I'll leave it to @davidberenstein1957 and @sdiazlor to tell you if they need any help with this one or if it's something they prefer to do internally.

dvsrepo commented 4 months ago

Very cool notes about what a tutorial should be @louisguitton , fully agree!

We used to use tutorials more as a blog post to promote and introduce argilla to new users on social media but that has created a bit of a mismatch now. We'll take it into account for version 2.0 of the docs (unstarted but planned)

sdiazlor commented 3 months ago

Hi, @louisguitton. Thanks for your notes! Any feedback is always welcome. Feel free to work on this tutorial and let us know if you have any doubt.

dvsrepo commented 3 months ago

Let's wait for the new SpanQuestion

github-actions[bot] commented 1 week ago

This issue is stale because it has been open for 90 days with no activity.

louisguitton commented 1 week ago

Since we started this discussion, SpanQuestion was released

The football news dataset, the code snippets I contribute in the talk, the structure of the talk can all be used to create a tutorial. A part 2 of the talk was also discussed, to address some of the parts I didn't have time to cover: train a model, use weak supervision with skweak, do KG construction with the entities found etc...

The scoping exercise (i.e. splitting in parts and making sure we deliver small and incremental value) for NER is key I think, so any input from User feedback or Customer needs or Product vision is welcome to help prioritise.