Closed alvarobartt closed 1 year ago
can I pick this issue ?
API_URL will be /api/v1/dataset/
right ? (referred from src/argilla/client/sdk/v1/datasets/api.py
)
if no need some time to find all the reference to API_URL some hint will help,
Feel free to pick this @krishnajalan, just note that the API_URL
is the one specified in rg.init(api_url=...)
which can be retrieved as rg.active_client().api_url
😄
hey @alvarobartt got Error when accessing rg.active_client().api_url
=>
'Argilla' object has no attribute 'api_url'
but can we use rg.active_client().http_client.base_url
?
True @krishnajalan, feel free to use rg.active_client().http_client.base_url
instead, I thought we were setting self.api_url
but seems that it's directly injected into the httpx
client 👍🏻
hey @alvarobartt some of the UT's are failing. which are unrelated to my changes. did I forget to do some steps ?
FAILED tests/training/test_span_marker.py::test_evaluate_train_test - TypeError: string indices must be integers
FAILED tests/training/test_span_marker.py::test_train_no_model - TypeError: string indices must be integers
FAILED tests/training/test_span_marker.py::test_various_inputs - TypeError: string indices must be integers
==== 4 failed, 1074 passed, 39 skipped, 10385 warnings in 710.44s (0:11:50) ====
Error: Process completed with exit code 1.
Hi @krishnajalan, yes, it seems unrelated, I'll have a look at those, thanks for reporting!
Any update on this @krishnajalan? We'd love to include it in the next Argilla release! 🔥
yep have made the commit will create PR and put it for review by today, but UT's are failing will it be fine if I create PR with failing UTs?
Thanks @krishnajalan, you can create the PR and the failing unit tests won't matter if unrelated, otherwise, those should pass before merging into develop
. But anyway, feel free to create the PR as a draft so that we can review it and help you with the unit tests if needed 😄
Hi! When we first discussed this I was thinking about our previous behavior. If you are pushing data with a script or from a notebook you can easily click the link and go to the dataset.
So this is how I see it:
So it's more about showing info messages than making the methods return the URL (which is also fine I guess but not as useful).
For inspiration, if I recall correctly, wandb shows a nicely formatted table with the links to the run experiment.
@krishnajalan maybe you can have a look at the comment above from @dvsrepo where he shares his thoughts on the next steps to tackle the current issue 😄
will printing the formatted URL work ? print(f"Argilla Dataset URL: {url}")
?
it will be clickable but is this the right way ?
This is the strategy we use for rg.log
@alvarobartt please confirm we can/should use the same approach
This is the strategy we use for
rg.log
@alvarobartt please confirm we can/should use the same approach
Yes, indeed we can use the same approach, not sure about the Failed
count, but for the rest feel free to re-use those messages @krishnajalan
@alvarobartt this could be closed right?
is this really done? Could you point me at the PR tackling this specific issue?
Hi @dvsrepo so now we're just returning the RemoteFeedbackDataset
i.e. a FeedbackDataset
in Argilla, and we have the property url
there, so one can do:
remote_dataset = dataset.push_to_argilla(name="my-dataset", workspace="my-workspace")
remote_dataset.url
So as we return the remote object instead we are not returning the URL, but we can create a mini-PR just to print it out automatically when pushing it to Argilla, even though users may additionally be able to just remote_dataset.url
, WDYT? 😄
@alvarobartt I think both would be good.
@alvarobartt yes, maybe the title/description of the issue was misleading but what I meant is to improve the usability by showing (print) a clickable URL pointing at the dataset just updated/created.
Also using a few days ago the previous rg.log vs push_to_argilla I notice the progress bar of rg.log looks nicer (using colab and jupyter notebooks within vscode), are we using the same library/function? If we are not using Rich for the new Feedback task progress bars I think we should.
You can create an issue covering these two enhancements and tag it as good first issue:
Sure @dvsrepo I'll create those and then close this one in favour of those ones! Thanks for reporting and following up!
Perfect @alvarobartt !
Hi @alvarobartt I am trying to push a dataset to my argilla endpoint as follows:
ds = rg.FeedbackDataset.from_huggingface("vegeta/testargilla")
ds.push_to_argilla(name="hf-vegeta", workspace="test-workspace")
I am getting the following error: I am not sure why is it giving a deleting related error
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /Users/xxxxxx/site-packages/argilla/client/feedback/da │ │ taset/local/mixins.py:214 in __publish_dataset │ │ │ │ 211 │ @staticmethod │ │ 212 │ def __publish_dataset(client: "httpx.Client", id: UUID) -> None: │ │ 213 │ │ try: │ │ ❱ 214 │ │ │ datasets_api_v1.publish_dataset(client=client, id=id) │ │ 215 │ │ except Exception as e: │ │ 216 │ │ │ ArgillaMixin.__delete_dataset(client=client, id=id) │ │ 217 │ │ │ raise Exception(f"Failed while publishing the
FeedbackDataset` in Argilla w │
│ │
│ /Users/xxxxxxx/site-packages/argilla/client/sdk/v1/data │
│ sets/api.py:139 in publish_dataset │
│ │
│ 136 │ │ response_obj = Response.from_httpx_response(response) │
│ 137 │ │ response_obj.parsed = FeedbackDatasetModel(response.json()) │
│ 138 │ │ return response_obj │
│ ❱ 139 │ return handle_response_error(response) │
│ 140 │
│ 141 │
│ 142 def list_datasets( │
│ │
│ /Users/xxxxxx/site-packages/argilla/client/sdk/commons │
│ /errors_handler.py:63 in handle_response_error │
│ │
│ 60 │ │ error_type = GenericApiError │
│ 61 │ else: │
│ 62 │ │ raise HttpResponseError(response=response) │
│ ❱ 63 │ raise error_type(error_args) │
│ 64 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ForbiddenApiError: Argilla server returned an error with http status: 403. Error details: {'response': '<!doctype html><meta name=viewport
content="width=device-width, initial-scale=1">
During handling of the above exception, another exception occurred:
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/xxxxx/site-packages/argilla/client/feedback/da │
│ taset/local/mixins.py:92 in __delete_dataset │
│ │
│ 89 │ @staticmethod │
│ 90 │ def __delete_dataset(client: "httpx.Client", id: UUID) -> None: │
│ 91 │ │ try: │
│ ❱ 92 │ │ │ datasets_api_v1.delete_dataset(client=client, id=id) │
│ 93 │ │ except Exception as e: │
│ 94 │ │ │ raise Exception( │
│ 95 │ │ │ │ f"Failed while deleting the FeedbackDataset
with ID '{id}' from Argill │
│ │
│ /Users/xxxxxxxxx/site-packages/argilla/client/sdk/v1/data │
│ sets/api.py:113 in delete_dataset │
│ │
│ 110 │ │
│ 111 │ if response.status_code == 200: │
│ 112 │ │ return Response.from_httpx_response(response) │
│ ❱ 113 │ return handle_response_error(response) │
│ 114 │
│ 115 │
│ 116 def publish_dataset( │
│ │
│ /Users/xxxxxxx/site-packages/argilla/client/sdk/commons │
│ /errors_handler.py:63 in handle_response_error │
│ │
│ 60 │ │ error_type = GenericApiError │
│ 61 │ else: │
│ 62 │ │ raise HttpResponseError(response=response) │
│ ❱ 63 │ raise error_type(**error_args) │
│ 64 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ForbiddenApiError: Argilla server returned an error with http status: 403. Error details: {'response': '<!doctype html><meta name=viewport
content="width=device-width, initial-scale=1">
The above exception was the direct cause of the following exception:
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/xxxxx/site-packages/argilla/client/feedback/da │
│ taset/local/mixins.py:258 in push_to_argilla │
│ │
│ 255 │ │ │ │ │ vectors_settings=self.vectors_settings, client=httpx_client, id=crea │
│ 256 │ │ │ │ ) │
│ 257 │ │ │ │
│ ❱ 258 │ │ │ ArgillaMixin.publish_dataset(client=httpx_client, id=created_dataset.id) │
│ 259 │ │ │ │
│ 260 │ │ │ # TODO: Remote dataset should connect all settings by API calls requested on │
│ 261 │ │ │ # Once is done, this prefetch info should be removed. │
│ │
│ /Users/xxxx/site-packages/argilla/client/feedback/da │
│ taset/local/mixins.py:216 in publish_dataset │
│ │
│ 213 │ │ try: │
│ 214 │ │ │ datasets_api_v1.publish_dataset(client=client, id=id) │
│ 215 │ │ except Exception as e: │
│ ❱ 216 │ │ │ ArgillaMixin.delete_dataset(client=client, id=id) │
│ 217 │ │ │ raise Exception(f"Failed while publishing the FeedbackDataset
in Argilla w │
│ 218 │ │
│ 219 │ def push_to_argilla( │
│ │
│ /Users/xxxxxx/site-packages/argilla/client/feedback/da │
│ taset/local/mixins.py:94 in delete_dataset │
│ │
│ 91 │ │ try: │
│ 92 │ │ │ datasets_api_v1.delete_dataset(client=client, id=id) │
│ 93 │ │ except Exception as e: │
│ ❱ 94 │ │ │ raise Exception( │
│ 95 │ │ │ │ f"Failed while deleting the FeedbackDataset
with ID '{id}' from Argill │
│ 96 │ │ │ ) from e │
│ 97 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: Failed while deleting the FeedbackDataset
with ID 'xxxxxxx' from Argilla with exception: Argilla server returned an error
with http status: 403. Error details: {'response': '<!doctype html>
Hi @shahdghorsi, thanks for reporting! Could you check whether your Argilla instance is running fine, and that you're using an owner
user? It's weird because the issue raised is HTTP 403, so it may be due to missing permissions, just owners
and workspace admins
can create FeedbackDataset
s in Argilla, so you may be using an user with an unauthorised role
Hi @alvarobartt, thanks for getting back to me. The Argilla instance is running well actually and the users added are all admins so I am not sure what was wrong
Hi @alvarobartt I am actually having a problem with this, the argilla endpoint is behind an IAP and I can connect easily but when I try to create push a dataset from my local to a specific workspace I get the following: ``` ForbiddenApiError: Argilla server returned an error with http status: 403. Error details: {'response': '<!doctype html>
I can see the dataset name in the UI when I login but it says : 0 results found despite passing data.
The code I am using works perfectly for an instance started on my local host but not for the actual endpoint that I want to use.
I am using `rg.log(records, name= argilla_data_name, workspace = "test-workspace")` instead of push_to_argilla
Could you please help?
Thanks,
Hi @alvarobartt I am actually having a problem with this, the argilla endpoint is behind an IAP and I can connect easily but when I try to create push a dataset from my local to a specific workspace I get the following: ``` ForbiddenApiError: Argilla server returned an error with http status: 403. Error details: {'response': '<!doctype html>
403 403 Forbidden'}I can see the dataset name in the UI when I login but it says : 0 results found despite passing data. The code I am using works perfectly for an instance started on my local host but not for the actual endpoint that I want to use. I am using `rg.log(records, name= argilla_data_name, workspace = "test-workspace")` instead of push_to_argilla Could you please help? Thanks,
Actually, I resolved that error after removing the following part of my code where I was adding some extra lables I am not sure what is wrong with this and why it works without it and not when I add it backm settings = rg.TextClassificationSettings(label_schema=set(label_list))
rg.configure_dataset_settings(name=argilla_data_name,
settings=settings,
workspace= "test-workspace")
Description
As it's considered useful, we could return the URL of a
FeedbackDataset
that has been pushed either to Argilla or to the HuggingFace Hub, to let the user easily access the dataset and know where it has been pushed.Solution
Build the URL for both Argilla and the HuggingFace Hub pointing to the
FeedbackDataset
that was just uploaded to either one of those.<API_URL>/<DATASET_ID>/annotation-mode