Before submitting

- [ ] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements) - [ ] Did you read the [contributor guideline](https://github.com/Lightning-AI/lit-data/blob/main/.github/CONTRIBUTING.md), Pull Request section? - [ ] Did you make sure to update the docs? - [ ] Did you write any new necessary tests?

This PR enables merging optimized datasets together.

Create 2 different datasets

from litdata import optimize, StreamingDataset

def compress(index):
    return index, index**2

if __name__ == "__main__":
    # Add some data
    optimize(
        fn=compress,
        inputs=list(range(100)),
        output_dir="/teamspace/s3_connections/laoin-400m/folder_1",
        chunk_bytes="64MB",
    )

from litdata import optimize, StreamingDataset

def compress(index):
    return index, index**2

if __name__ == "__main__":
    # Add some data
    optimize(
        fn=compress,
        inputs=list(range(100)),
        output_dir="/teamspace/s3_connections/laoin-400m/folder_2",
        chunk_bytes="64MB",
    )

Merged into a third one

from litdata import merge_datasets

merge_datasets(
    input_dirs=[
        "/teamspace/s3_connections/laoin-400m/folder_1",
        "/teamspace/s3_connections/laoin-400m/folder_2"
    ],
    output_dir="/teamspace/s3_connections/laoin-400m/folder_3"
)

What does this PR do?

Fixes # (issue).

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Lightning-AI / litdata

Add utility to merge datasets together #190

Create 2 different datasets

Merged into a third one

What does this PR do?

PR review

Did you have fun?