argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
3.91k stars 367 forks source link

[FEATURE] Update small-text tutorials #3442

Closed chschroeder closed 1 year ago

chschroeder commented 1 year ago

There are two notebooks, which reference small-text versions v1.1.0/v1.1.1.

I will likely create a PR later or tomorrow. This issue is just for the purposes of documentation.

Is your feature request related to a problem? Please describe. I just published the bugfix release v1.3.1, which brings important bugfixes for transformer-based classification. Moreover, the additions from v1.2.0 will be accessible as well.

Describe the solution you'd like Two notebooks need to be updated:

Describe alternatives you've considered

--

Additional context

--

chschroeder commented 1 year ago

Seems that I, on the other hand, a was a few argilla versions behind. This is more complicated than just raising the version number and running the notebook once.

When I execute the following part from the first notebook:

import argilla as rg

# Choose a name for the dataset
DATASET_NAME = "trec_with_active_learning"

# Define labeling schema
labels = trec["train"].features["coarse_label"].names
settings = rg.TextClassificationSettings(label_schema=labels)

# Create dataset with a label schema
rg.configure_dataset_settings(name=DATASET_NAME, settings=settings)

Then, rg.configure_dataset_settings() raises the following error:

BadRequestApiError: Argilla server returned an error with http status: 400
Error details: [{'code': 'argilla.api.errors::MissingInputParamError', 'params': {'message': 'A workspace must be 
provided'}}]

Can this be used without setting up an explicit workspace? As far as I have understood, using workspaces would require additional setup steps, which I would like to avoid.

davidberenstein1957 commented 1 year ago

@chschroeder thanks for taking a look at this. A PR is very welcome.

davidberenstein1957 commented 1 year ago

I think the above can be resolved by adding a workspace parameter rg.configure_dataset_settings(). Alternatively, you could set rg.set_workspace().

chschroeder commented 1 year ago

Thanks for the feedback, but that a workspace could help is what I had given by the error message. What I was trying to ask is more like: is this really needed? Is this the right way?

Next, I tried it by following this example: https://docs.argilla.io/en/latest/guides/llms/examples/curating-feedback-instructiondataset.html

try:
    workspace = rg.Workspace.from_name(username)
except:
    workspace = rg.Workspace.create(username)
    user = rg.User.from_name(username)
    workspace.add_user(user.id)

Here rg.Workspace.create() does not work with the default admin user:

PermissionError: User with role=admin is not allowed to call `create`. Only users with role=[<UserRole.owner: 
'owner'>] are allowed to call this function.

Of course I can add a user now, but adds two extra steps (creating a workspace, creating a user) to the notebook, compared to before.

Is this really necessary? And if so, why does it seem not to be necessary for other tutorials?

Edit: Same for user rg.User.from_name(). This is really confusing and not clear by reading the docs.

PermissionError: User with role=admin is not allowed to call `create`. Only users with role=[<UserRole.owner: 
'owner'>] are allowed to call this function.