Closed frascuchon closed 5 months ago
Looks good!!
One question:
This looks like a two-way sync like the one we have for Alpaca (exactly what we want for Spaces)
For more general use cases, could/do we also cover the use case where a user just wants to make sure the dataset is stored in a Hub Dataset periodically and not loading the dataset from the Hub back to the Argilla dataset?
(Note: I only skimmed through the code)
Looks good!!
One question:
This looks like a two-way sync like the one we have for Alpaca (exactly what we want for Spaces)
For more general use cases, could/do we also cover the use case where a user just wants to make sure the dataset is stored in a Hub Dataset periodically and not loading the dataset from the Hub back to the Argilla dataset?
(Note: I only skimmed through the code)
I've included some in the PR description but, yes. The idea is, by skipping the hf_source
the sync will be only in one way from Argilla to HF Dataset hub. The performance of this code is not the best, but some tests that I've done were working fine.
I will try to take a look tomorrow with your feedback.
Additionally, if the code is not robust enough, you could try to setup a retry-backoff with smaller chunk sizes?Cheers,David On 30 Mar 2023, at 18:43, Francisco Aranda @.***> wrote: I will try to take a look tomorrow with your feedback.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because your review was requested.Message ID: @.***>
Additionally, if the code is not robust enough, you could try to setup a retry-backoff with smaller chunk sizes?Cheers,David On 30 Mar 2023, at 18:43, Francisco Aranda @.> wrote: I will try to take a look tomorrow with your feedback. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because your review was requested.Message ID: @.>
Great! But I think this behavior should be in the core rg.log/rg.load methods
The plugin is working now by using environment variables. The name of variables is aligned with the naming of attributes in HuggingfaceSyncConfig class.
I've created a space here.
As TODO that can be tackled in another PR:
Also, it would be great to package the argilla-plugins package with the quickstart image.
ok!
what's left to close this version @frascuchon @davidberenstein1957 ?
Maybe a brief description of how to use it?
With this plugin, users can sync data from/to the HF datasets hub and an Argilla instance.
Refs https://github.com/argilla-io/argilla/issues/2614
Description
Since this plugin has several optional configurations, can cover several use cases. Let's enumerate some of them
hf_source
)hf_target
andrg_dataset
parameters)hf_source=hf_target
andrg_query=None
). Be careful with this since the dataset will be exported totally eachhf_push_to_hub_frequency
seconds.