Closed GinoWoz1 closed 1 year ago
The issue seems to be that the dataset name that you're loading (safety_testing
) differs from the dataset that you pushed records to feedback_testing
. I suspect this should work if you use the same name for both.
I updated @tomaarsen , I mistakenly didnt update my test code for the example ( I was using my table name locally rather than the test code). It doesnt work when they match.
I can reproduce this locally - also with different datasets. I'll message the team internally, as I'm not sure what's causing this. Thank you for reporting this!
Hi @GinoWoz1 , thanks for raising this question. We should make this more clear.
The FeedbackDataset is different from the rest of the dataset types (TextClassification, Token, and so on).
The records are not retrieved with rg.load
but using the methods described here:
https://docs.argilla.io/en/latest/guides/llms/practical_guides/collect_responses.html
Let me know if this works for you
And if you still have the dataset instance (as in your example) you can use:
dataset.fetch_records()
In any case, we should provide a much more explicit error message explaining and guiding the user towards the solution:
"rg.load is not supported for FeedbackDatasets, you should use from_argilla, or ds.fetch_records() ... instead."
cc @alvarobartt @gabrielmbmb
Ahh, so that's how it works. I tried to find it in the recent blogpost, but records are never fetched.
Ahh, so that's how it works. I tried to find it in the recent blogpost, but records are never fetched.
Hi @tomaarsen the records are fetched by default when you call FeedbackDataset.from_argilla
classmethod unless you provide the flag with_records=False
, while for FeedbackDataset.from_huggingface
those are fetched by default (as in every HuggingFace dataset).
In any case, we should provide a much more explicit error message explaining and guiding the user towards the solution:
"rg.load is not supported for FeedbackDatasets, you should use from_argilla, or ds.fetch_records() ... instead."
cc @alvarobartt @gabrielmbmb
Regarding this, we're currently using two API versions at the same time, so when we list the datasets we either list: TextClassificationDataset
, TokenClassificationDataset
, and Text2TextDataset
, or FeedbackDataset
, so not sure if there's a clean way to prevent this with a warning.
Besides that, everything's explained in the docs at https://docs.argilla.io/en/latest/guides/llms/practical_guides/export_dataset.html
So as mentioned in my comments above @GinoWoz1, in order to load the new FeedbackDataset
you no longer need to rely on rg.load
, but on FeedbackDataset.from_argilla
or FeedbackDataset.from_huggingface
classmethods to retrieve the dataset configuration and the records (unless you specify with_records=False
in from_argilla
)
Let us know if you have any other doubt, or if something's not clear enough in the documentation so that we can update it! 😄
@alvarobartt maybe we can add a try/except
block in rg.load
and do an extra call to the API to check if there is a Feedback Dataset with the provided name and warn the user.
Awesome, thanks so much, it worked! I was able to get the feedback dataset. Will FeedbackDataset.to_pandas() ever be implemented ? I can deal with the object now no problem, but curious.
@alvarobartt maybe we can add a
try/except
block inrg.load
and do an extra call to the API to check if there is a Feedback Dataset with the provided name and warn the user.
Sure, we can do that! Do you want to pick that up @gabrielmbmb? 😄
Awesome, thanks so much, it worked! I was able to get the feedback dataset. Will FeedbackDataset.to_pandas() ever be implemented ? I can deal with the object now no problem, but curious.
Hi @GinoWoz1, glad that it worked! So we will add it soon, in the meantime there's support for HuggingFace datasets
with the new format_as
function, so you can dataset.format_as("datasets")
👍🏻 Note that it just works for the new FeedbackDataset
s, for the rest of them you can keep on using the rg.
functions!
The issue seems solved, so I'm closing it.
Since feedback datasets are used totally differently from the previous one, I think it makes sense to separate and combine all related operations under the argilla.feedback
module and avoid mixing with the rest of the datasets.
import argilla as rg
from argilla.feedback import FeedbackDataset
rg.init(....)
ds = FeedbackDataset. from_argilla(...)
You could eventually have specific trainers, unification strategies, or other specific feedback dataset functionality under the same module.
from argilla.feedback.trainer import ...
from argilla.feedback.whatever_cool_thing_only_for_feedback import ...
This way can help users to understand separation from previous Argilla capabilities.
what do you think @alvarobartt @dvsrepo @gabrielmbmb @davidberenstein1957
@frascuchon I agree, but we would also be helped a lot w.r.t. and warnings on top of that.
I already implemented some import related stuff w.r.t. the Trainer
as discussed during the review.
Describe the bug After creating a feedback data set when I try to use rg.load(name = feedbackDatasetName) I get an error.
Stacktrace and Code to create the bug
Error is:
NotFoundApiError: Argilla server returned an error with http status: 404 Error details: [{'code': 'argilla.api.errors::EntityNotFoundError', 'params': {'name': 'safety_testing', 'type': 'ServiceDataset'}}]
Expected behavior I expect to read a dataset like I do any other dataset
Environment: