allenai / tailor

Apache License 2.0
31 stars 4 forks source link

Bump datasets from 1.15.1 to 2.13.0 #170

Closed dependabot[bot] closed 1 year ago

dependabot[bot] commented 1 year ago

Bumps datasets from 1.15.1 to 2.13.0.

Release notes

Sourced from datasets's releases.

2.13.0

Dataset Features

  • Add IterableDataset.from_spark by @​maddiedawson in huggingface/datasets#5770

    • Stream the data from your Spark DataFrame directly to your training pipeline
    from datasets import IterableDataset
    from torch.utils.data import DataLoader
    

    ids = IterableDataset.from_spark(df) ids = ids.map(...).filter(...).with_format("torch") for batch in DataLoader(ids, batch_size=16, num_workers=4): ...

  • IterableDataset formatting for PyTorch, TensorFlow, Jax, NumPy and Arrow:

    from datasets import load_dataset
    

    ids = load_dataset("c4", "en", split="train", streaming=True) ids = ids.map(...).with_format("torch") # to get PyTorch tensors - also works with tf, np, jax etc.

  • Add IterableDataset.from_file to load local dataset as iterable by @​mariusz-jachimowicz-83 in huggingface/datasets#5893

    from datasets import IterableDataset
    

    ids = IterableDataset.from_file("path/to/data.arrow")

  • Arrow dataset builder to be able to load and stream Arrow datasets by @​mariusz-jachimowicz-83 in huggingface/datasets#5944

    from datasets import load_dataset
    

    ds = load_dataset("arrow", data_files={"train": "train.arrow", "test": "test.arrow"})

Experimental

General improvements and bug fixes

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependabot[bot] commented 1 year ago

Superseded by #172.