Imageomics / Image-Datapalooza-2023

Repository for the Image Datapalooza 2023 event held at OSU in August 2023.
Creative Commons Zero v1.0 Universal
3 stars 2 forks source link

Data Dashboard for Expedited EDA #2

Open egrace479 opened 11 months ago

egrace479 commented 11 months ago

There’s a lot of available data, but determining its level of usefulness (via exploratory data analysis) can be a very time-consuming process. I have been working on a dashboard to visualize distribution information and samples of datasets efficiently and with no coding required. We have a pre-release hosted for use during Image Datapalooza. It is currently a bit Imageomics-focused, though images aren’t required to gather distribution statistics. I think it would be great to expand out the functionality for a more general audience as a way to quickly generate visuals for data or explore whether new data would be suitable for experiments without having to invest large amounts of time into EDA.

nickynicolson commented 11 months ago

Could this integrate with the practice of documenting datasets via "dataset cards"? (see e.g. https://huggingface.co/docs/hub/datasets-cards):

egrace479 commented 11 months ago

Definitely. I have actually used it to generate a static display for one of the Institute's datasets (currently private) and plan to do so for more. I would be interested in integrating the interactive dashboard into a HF Space as well.

egrace479 commented 11 months ago

Turns out deployment in a HF Space is relatively straightforward with a Docker container. We have v1.0.0 on HF, and set up a development version for testing off our dev branch.