allenporter / home-assistant-datasets

This package is a collection of datasets for evaluating AI Models in the context of Home Assistant.
https://allenporter.github.io/home-assistant-datasets
32 stars 2 forks source link

Update dependency datasets to v3 #41

Closed renovate[bot] closed 2 months ago

renovate[bot] commented 2 months ago

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
datasets ==2.21.0 -> ==3.0.0 age adoption passing confidence

Release Notes

huggingface/datasets (datasets) ### [`v3.0.0`](https://redirect.github.com/huggingface/datasets/releases/tag/3.0.0) [Compare Source](https://redirect.github.com/huggingface/datasets/compare/2.21.0...3.0.0) #### Dataset Features - Use Polars functions in `.map()` - Allow Polars as valid output type by [@​psmyth94](https://redirect.github.com/psmyth94) in [https://github.com/huggingface/datasets/pull/6762](https://redirect.github.com/huggingface/datasets/pull/6762) - Example: ```python >>> from datasets import load_dataset >>> ds = load_dataset("lhoestq/CudyPokemonAdventures", split="train").with_format("polars") >>> cols = [pl.col("content").str.len_bytes().alias("length")] >>> ds_with_length = ds.map(lambda df: df.with_columns(cols), batched=True) >>> ds_with_length[:5] shape: (5, 5) ┌─────┬───────────────────────────────────┬───────────────────────────────────┬───────────────────────┬────────┐ │ idx ┆ title ┆ content ┆ labels ┆ length │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ str ┆ str ┆ u32 │ ╞═════╪═══════════════════════════════════╪═══════════════════════════════════╪═══════════════════════╪════════╡ │ 0 ┆ The Joyful Adventure of Bulbasau… ┆ Bulbasaur embarked on a sunny qu… ┆ joyful_adventure ┆ 180 │ │ 1 ┆ Pikachu's Quest for Peace ┆ Pikachu, with his cheeky persona… ┆ peaceful_narrative ┆ 138 │ │ 2 ┆ The Tender Tale of Squirtle ┆ Squirtle took everyone on a memo… ┆ gentle_adventure ┆ 135 │ │ 3 ┆ Charizard's Heartwarming Tale ┆ Charizard found joy in helping o… ┆ heartwarming_story ┆ 112 │ │ 4 ┆ Jolteon's Sparkling Journey ┆ Jolteon, with his zest for life,… ┆ celebratory_narrative ┆ 111 │ └─────┴───────────────────────────────────┴───────────────────────────────────┴───────────────────────┴────────┘ ``` - Support NumPy 2 - Allow numpy-2.1 and test it without audio extra by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/7118](https://redirect.github.com/huggingface/datasets/pull/7118) #### Cache Changes - Use `huggingface_hub` cache by [@​lhoestq](https://redirect.github.com/lhoestq) in [https://github.com/huggingface/datasets/pull/7105](https://redirect.github.com/huggingface/datasets/pull/7105) - use the `huggingface_hub` cache for files downloaded from HF, by default at `~/.cache/huggingface/hub` - cached datasets (Arrow files) will still be reloaded from the `datasets` cache, by default at `~/.cache/huggingface/datasets` #### Breaking changes - Remove deprecated code by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/6996](https://redirect.github.com/huggingface/datasets/pull/6996) - removed deprecated arguments like `use_auth_token`, `fs` or `ignore_verifications` - Remove beam by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/6987](https://redirect.github.com/huggingface/datasets/pull/6987) - removed deprecated apache beam datasets support - Remove metrics by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/6983](https://redirect.github.com/huggingface/datasets/pull/6983) - remove deprecated `load_metric`, please use the `evaluate` library instead - Remove tasks by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/6999](https://redirect.github.com/huggingface/datasets/pull/6999) - remove deprecated `task` argument in `load_dataset()` `.prepare_for_task()` method, `datasets.tasks` module #### General improvements and bug fixes - Improved the tutorial by adding a link for loading datasets by [@​AmboThom](https://redirect.github.com/AmboThom) in [https://github.com/huggingface/datasets/pull/7042](https://redirect.github.com/huggingface/datasets/pull/7042) - Automatically create `cache_dir` from `cache_file_name` by [@​ringohoffman](https://redirect.github.com/ringohoffman) in [https://github.com/huggingface/datasets/pull/7096](https://redirect.github.com/huggingface/datasets/pull/7096) - remove more script docs by [@​lhoestq](https://redirect.github.com/lhoestq) in [https://github.com/huggingface/datasets/pull/7104](https://redirect.github.com/huggingface/datasets/pull/7104) - Fix args of feature docstrings by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/7103](https://redirect.github.com/huggingface/datasets/pull/7103) - Temporarily pin numpy<2.1 to fix CI by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/7114](https://redirect.github.com/huggingface/datasets/pull/7114) - Fix ConnectionError for gated datasets and unauthenticated users by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/7110](https://redirect.github.com/huggingface/datasets/pull/7110) - Install transformers with numpy-2 CI by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/7119](https://redirect.github.com/huggingface/datasets/pull/7119) - don't mention the script if trust_remote_code=False by [@​severo](https://redirect.github.com/severo) in [https://github.com/huggingface/datasets/pull/7120](https://redirect.github.com/huggingface/datasets/pull/7120) - Fix typed examples iterable state dict by [@​lhoestq](https://redirect.github.com/lhoestq) in [https://github.com/huggingface/datasets/pull/7121](https://redirect.github.com/huggingface/datasets/pull/7121) - Rename LargeList.dtype to LargeList.feature by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/7106](https://redirect.github.com/huggingface/datasets/pull/7106) - Fix wrong SHA in CI tests of HubDatasetModuleFactoryWithParquetExport by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/7125](https://redirect.github.com/huggingface/datasets/pull/7125) - Disable implicit token in CI by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/7126](https://redirect.github.com/huggingface/datasets/pull/7126) - Test get_dataset_config_info with non-existing/gated/private dataset by [@​albertvillanova](https://redirect.github.com/albertvillanova) in [https://github.com/huggingface/datasets/pull/7124](https://redirect.github.com/huggingface/datasets/pull/7124) - fix streaming from arrow files by [@​fschlatt](https://redirect.github.com/fschlatt) in [https://github.com/huggingface/datasets/pull/7083](https://redirect.github.com/huggingface/datasets/pull/7083) #### New Contributors - [@​AmboThom](https://redirect.github.com/AmboThom) made their first contribution in [https://github.com/huggingface/datasets/pull/7042](https://redirect.github.com/huggingface/datasets/pull/7042) - [@​fschlatt](https://redirect.github.com/fschlatt) made their first contribution in [https://github.com/huggingface/datasets/pull/7083](https://redirect.github.com/huggingface/datasets/pull/7083) **Full Changelog**: https://github.com/huggingface/datasets/compare/2.21.0...3.0.0

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.



This PR was generated by Mend Renovate. View the repository job log.