argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
3.79k stars 354 forks source link

feat: basic Excel/tabular integrations for importing/exporting data #1888

Open davidberenstein1957 opened 1 year ago

davidberenstein1957 commented 1 year ago

Is your feature request related to a problem? Please describe. One of our clients wanted to have a more diverse team of non-technical people involved in the annotation process, which might not have experience with anything about programming.

Describe the solution you'd like Be able import/export Excel.

Describe alternatives you've considered Do python.client uploads, but that sadly doesn´t work.

Additional context N.A.

dvsrepo commented 1 year ago

This is related to #1870 as I understand you are referring to the UI. Although I would say that supporting data upload from Excel can be much more complex and trickier to support than it seems. I guess that even CSV would be fine for profiles without programming knowledge, but we can discuss about value vs complexity. Exporting as Excel should be fine and easy to do (for non huge datasets of course).

The comment regarding the python client means you tried pandas and from_pandas and didn't work? Or that it's of course not possible for users without python skills to do so?

davidberenstein1957 commented 1 year ago

It is indeed related. It is more about being able to do some basic data importing/downloading without programming knowledge.

dhruvsakalley commented 1 year ago

This is a useful feature, but I would like to point out that even with simple exports massaging is required per custom needs, maybe a macro button which can hold some custom export python logic would be very generic.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity.

davidberenstein1957 commented 1 year ago

https://huggingface.co/spaces/argilla/data-manager

dvsrepo commented 1 year ago

@davidberenstein1957 maybe we can add import/export from xls too? wdyt?

dhruvsakalley commented 1 year ago

Thanks for following up on this, while the data Manager is a great idea, sometimes you wants to run a query before you export the data, perhaps only a few annotations or maybe a date filter. Would be really nice if the results in context could be exported from the UI.

dvsrepo commented 1 year ago

Very good suggestion @dhruvsakalley. We will introduce this feature directly in the UI in the future.

As an immediate step, I have included the query in the data manager, with a link to the queries docs. This means

Screenshot 2023-02-13 at 10 55 15
davidberenstein1957 commented 1 year ago

Hi,

I also added this to the argilla-streamlit repo.

Regards, David

On 13 Feb 2023, at 11:04, Daniel Vila Suero @.***> wrote:

Very good suggestion Dhruv. We will introduce this feature directly in the UI in the future.

As an immediate step, I have included the query in the data manager, with a link to the queries docs. This means

https://user-images.githubusercontent.com/1107111/218427049-72f6378c-4ac9-445c-b7a6-1ed90d555760.png — Reply to this email directly, view it on GitHub https://github.com/argilla-io/argilla/issues/1888#issuecomment-1427660224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAZHZFXR2RH4XIB7RVR3G3WXIBMXANCNFSM6AAAAAAR5WZIBQ. You are receiving this because you were mentioned.

dhruvsakalley commented 1 year ago

Thanks Daniel, I think we can sync a bit on the Data Manager, it's a really good way of wrapping ops related to data, seems like a good place to build integrations into databases to enable things like streaming /rolling updates based on SQL triggers etc. Also, a good place to add label management functionality.

davidberenstein1957 commented 1 year ago

@dhruvsakalley I have fine-tuned this a bit here. Perhaps what you describe would be some kind of listener? I would love to help here from the Argilla side too.

dhruvsakalley commented 1 year ago

Apologize for going on a tangent way beyond the scope of the issue, but maybe this helps.

There are many ways to approach how this data lands in argilla, sure an event driven paradigm might be a good solution, even some kind of polling mechanism would be just as useful for majority of the cases. What architecture you chose to solve this is all good, but here are some of the needs for a rolling update:

These scenarios emerge when we chose to work with continuously updating systems. I think handling some of these scenarios is a missed opportunity for a lot of annotation tools that I have come across in the past. Lot of data engineering gets left behind, though these are fairly standard things which can be abstracted from the end user experience.

In an ideal world all the "human work" aspect on data annotation should be under source control just like we handle code/ documentation, but that's probably a conversation to be had over a beer.

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 90 days with no activity.