[DOCS] Create a tutorial about using SpanMarker and Argilla

sdiazlor commented 8 months ago

Which page or section is this issue related to?

To create a tutorial about how to use SpanMaker and Argilla for NER

davidberenstein1957 commented 7 months ago

https://huggingface.co/datasets?task_ids=task_ids%3Anamed-entity-recognition

davidberenstein1957 commented 7 months ago

https://docs.argilla.io/en/latest/tutorials/notebooks/ner_fine_tune_bert_beginners.html

davidberenstein1957 commented 7 months ago

https://www.numind.ai/blog/a-foundation-model-for-entity-recognition

davidberenstein1957 commented 7 months ago

https://huggingface.co/numind/generic-entity_recognition_NER-v1

Rami-Ismael commented 6 months ago

Hi there! I'm Rami Ismael, the individual behind the GitHub issues initiative as discussed here. I'm currently enjoying my winter break and have some free time on my hands. I'm keen on offering my assistance to help finalize the documentation. Would that be possible?

nataliaElv commented 5 months ago

Perhaps this tutorial will be more useful when we release the Spans Question for Feedback Datasets? Otherwise it will be outdated quite soon.

ceteri commented 4 months ago

We're building this https://github.com/DerwenAI/textgraphs which leverages SpanMarker and other LLM-based tasks in KG construction ... and if you notice the "report" this project has a very large Argilla-shaped puzzle piece missing in its center (why we needed the gradients for extracted entity and relation streams). I'd like to offer help on the SpanMarker + Argilla tutorial too.

louisguitton commented 4 months ago

I'd also like to offer help on this tutorial, whether on designing it, writing it or maintaining it.

My notes on writing a tutorial:

the end of the tutorial must be meaningful and achievable to a beginner
having done the tutorial, the reader is in position to make sense of the rest of the documentation and of Argilla itself
objective = turning learners into users, get the learner started on their Argilla journey not to their destination
Tutorials need to be useful for the beginner, easy to follow, meaningful and extremely robust, and kept up-to-date
build from the simplest tools or operations to the most complex
be concrete, built with specificity in mind, don't explain anything the learner doesn't need to know to complete the tutorial (e.g. Argilla telemetry)
Note that it doesn’t tell you what you will learn, just what you will do. The learning comes out of that doing.

Proposed promise for this Argilla + SpanMarker tutorial

if you have the basic knowledge required to follow this tutorial (e.g. spaCy?), and you follow its directions, you will end up with a working Argilla Server, complete with a FeedbackDataset with Span Categorization Questions, with NER label Suggestions machine-generated by SpanMarker, ready for Annotators to add Responses. Advanced readers will be able to add Metadata or Vectors.

What do you think? what would be a good amount of knowledge required?

And we are waiting for the Span Categorization to be released in the FeedbackDataset right? or did I miss this going live?

nataliaElv commented 4 months ago

hi @louisguitton ! Yes, we're working on releasing a Spans question for Feedback datasets and once that's out, we can start working on the tutorial. I think it would be highly beneficial for the adoption of this feature that the tutorial is published soon after the release.

I'll leave it to @davidberenstein1957 and @sdiazlor to tell you if they need any help with this one or if it's something they prefer to do internally.

dvsrepo commented 4 months ago

Very cool notes about what a tutorial should be @louisguitton , fully agree!

We used to use tutorials more as a blog post to promote and introduce argilla to new users on social media but that has created a bit of a mismatch now. We'll take it into account for version 2.0 of the docs (unstarted but planned)

sdiazlor commented 3 months ago

Hi, @louisguitton. Thanks for your notes! Any feedback is always welcome. Feel free to work on this tutorial and let us know if you have any doubt.

dvsrepo commented 3 months ago

Let's wait for the new SpanQuestion

github-actions[bot] commented 1 week ago

This issue is stale because it has been open for 90 days with no activity.

louisguitton commented 1 week ago

Since we started this discussion, SpanQuestion was released

V1.26 Mar 22, 2024 SpanQuestion is now part of FeedbackDataset #4617, #4623, #4622
V1.27 Apr 18, 2024 Overlapping spans are now possible #4668, #4697
V1.28 May 9, 2024 Span improvements #4735, #4726
NER on Argilla - meetup talk - Louis Guitton with code available here (not cleaned up) https://github.com/louisguitton/mlops-talk-llm-kg/tree/main/notebooks/argilla_talk

The football news dataset, the code snippets I contribute in the talk, the structure of the talk can all be used to create a tutorial. A part 2 of the talk was also discussed, to address some of the parts I didn't have time to cover: train a model, use weak supervision with skweak, do KG construction with the entities found etc...

The scoping exercise (i.e. splitting in parts and making sure we deliver small and incremental value) for NER is key I think, so any input from User feedback or Customer needs or Product vision is welcome to help prioritise.

argilla-io / argilla

[DOCS] Create a tutorial about using SpanMarker and Argilla #4086

Which page or section is this issue related to?