cfahlgren1 / hf-data-explorer

Chrome Extension for exploring Hugging Face datasets 🔎
https://chromewebstore.google.com/detail/hugging-face-datasets-exp/algkmpgdgbindfpddilldlogcbhpkhhd
24 stars 1 forks source link

bug: Sanitize View Names #16

Closed cfahlgren1 closed 6 days ago

cfahlgren1 commented 1 week ago

We need to normalize the view names to ensure they can be created. View names such as CC-MAIN-2013-20 for fineweb-edu will fail.

We can also use SELECT * FROM duckdb_keywords(); to get reserved words in DuckDB as well

image
cfahlgren1 commented 1 week ago

Sometimes the datasets server can return a 500 but recover on next request. Maybe we can add a retry as well

cfahlgren1 commented 1 week ago

Additionally, some datasets don't have a parquet export / conversion, example. We should show a small label to the user so they know why there aren't views for that specific dataset

cfahlgren1 commented 6 days ago
image

For datasets like this one that don't have parquet conversions we can show a lable.