SilentView / LVD-2M

[NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"
https://silentview.github.io/LVD-2M/
37 stars 3 forks source link

Release LVD-2M on Hugging Face #1

Open NielsRogge opened 1 month ago

NielsRogge commented 1 month ago

Hi @moatifbutt,

Niels here from the open-source team at Hugging Face. I discovered your work through the daily papers: https://huggingface.co/papers/2410.10816 (17 people upvoted it - feel free to claim it with your HF account). Congrats on getting it accepted to Neurips.

I work together with AK on improving the visibility of researchers' work on the hub.

It'd be great to make the dataset available on the 🤗 hub, we can add tags so that people find them when filtering https://huggingface.co/datasets, so that people can do:

from datasets import load_dataset

dataset = load_dataset("your-hf-username-or-organization/lvd-2m")

See here for a guide: https://huggingface.co/docs/datasets/image_dataset. We also support webdataset, which will be useful for video datasets: https://huggingface.co/docs/datasets/loading#webdataset.

There's then also the dataset viewer which allows people to see the first few rows in the browser: https://huggingface.co/docs/hub/en/datasets-viewer.

This would make the dataset easier accessible, and also discoverable.

We can then also link the dataset to the paper page.

Let us know if you need any help.

Cheers,

Niels ML Engineer @ HF 🤗

SilentView commented 1 month ago

Thank you for your suggestion! We will upload LVD-2M data files on Hugging Face soon 🚀

NielsRogge commented 1 month ago

Great! Btw the HF team recently worked on a large-scale video dataset too, you might be interested :)

cc @mfarre