Imageomics / Image-Datapalooza-2023

Repository for the Image Datapalooza 2023 event held at OSU in August 2023.
Creative Commons Zero v1.0 Universal
3 stars 2 forks source link

Data Dashboard for Expedited EDA #2

Open egrace479 opened 1 year ago

egrace479 commented 1 year ago

There’s a lot of available data, but determining its level of usefulness (via exploratory data analysis) can be a very time-consuming process. I have been working on a dashboard to visualize distribution information and samples of datasets efficiently and with no coding required. We have a pre-release hosted for use during Image Datapalooza. It is currently a bit Imageomics-focused, though images aren’t required to gather distribution statistics. I think it would be great to expand out the functionality for a more general audience as a way to quickly generate visuals for data or explore whether new data would be suitable for experiments without having to invest large amounts of time into EDA.

nickynicolson commented 1 year ago

Could this integrate with the practice of documenting datasets via "dataset cards"? (see e.g. https://huggingface.co/docs/hub/datasets-cards):

egrace479 commented 1 year ago

Definitely. I have actually used it to generate a static display for one of the Institute's datasets (currently private) and plan to do so for more. I would be interested in integrating the interactive dashboard into a HF Space as well.

egrace479 commented 1 year ago

Turns out deployment in a HF Space is relatively straightforward with a Docker container. We have v1.0.0 on HF, and set up a development version for testing off our dev branch.