Closed mpjuhasz closed 1 month ago
Hi @mpjuhasz, thank you for reporting. Krippendorff's Alpha is calculated based on the responses, so fields should not be affected. For instance, this is a practical example to calculate it. What type of question are you using?
@sdiazlor, thanks for the quick response. Yes, that's what I was expecting, yet it seems to look for this "text"
field. I'm using label_selection
type questions. Here's an example questions from the argilla.yaml
:
- description: null
id: <uuid>
labels:
DONT_KNOW: Don't know
'NO': 'No'
'YES': 'Yes'
name: toxicity
required: true
title: Does the response contain any toxic, harmful, or inappropriate content?
type: label_selection
visible_labels: null
@mpjuhasz Thank you, I'll check it.
@nataliaElv , potential bug fiz that needs to happen
@plaguss any ideas on what's happening?
@frascuchon @plaguss As far as I could check, when converted to Dataset, the columns take the field name. if the field name is not "text", it won't match, raising the error:
hf_dataset = dataset.format_as("datasets")
formatted_responses: FormattedResponses = []
for row in hf_dataset:
responses_ = row[question_name]
question_text = row["text"]
for response in responses_:
user_id = response["user_id"]
if user_id is None:
raise ValueError(
"Please push your dataset to argilla to have the user_id necessary for this computation."
)
I'll work on this one.
Hi @frascuchon @sdiazlor, I was reviewing the function and that's supposed to generate a data structure to be ingested by this class: nltk's AnnotationTask. The "question_text" corresponds to the "question_id" to be annotated in this task. I reviewed the tests from the corresponding PR (https://github.com/argilla-io/argilla/pull/4271/files#diff-50ab5090f045fe6dbc539fc2d511c315b7d0d248006198d0257ccc6718e5663cR88), and apparently the key "text" was used as that was the name used in the questions used for the sample dataset in the tests:
@pytest.fixture
def feedback_dataset_fields() -> List["AllowedFieldTypes"]:
return [
TextField(name="text", required=True),
TextField(name="label", required=True),
]
I cannot find a reason for why the "text" is hardcoded there really, but it could be an optional argument of the function with that name as default, or as Sara mentions, if the field is not used for the alpha
case, it could be filled with a placeholder.
Describe the bug I'm trying to run the agreement metrics on a
FeedbackDataset
. When runningmetric.compute("alpha")
, I'm seeing the below issue -- my dataset has no"text"
field. Updating the failing line (argilla/client/feedback/metrics/agreement_metrics.py:112
) to befixes the issue (or at least it runs and produces some result). From the context I understood that the
question_text
is to be used as a question id. Now, my dataset has a question id in the metadata which I'm using above, but there was no way to instruct argilla to use this.Stacktrace and Code to create the bug
Expected behavior User is able to provide a unique id of the question for this step of the process, or a unique id is generated based on the fields available (not necessarily
"text"
).Environment:
Additional context It might be relevant (but based on my understanding not impacting the behaviour), that the
FeedbackDataset
is in fact pulled from the Huggingface hub when this occurs.