Save Datasets to Different Directories

EthioNLP / afri-sft-data

This repository generates instruction tuning dataset from different datasets.

5 stars 1 forks source link

Save Datasets to Different Directories #9

Closed amaneth closed 8 months ago

amaneth commented 8 months ago

    output_path = "../logs/afri-rlhf"
    if not os.path.exists(output_path):
        os.makedirs(output_path)
    dataset.save_to_disk(output_path)

It seems train, test and validation splits of the datasets are being saved the same directory, each time a dataset saved it overwrites the previous dataset. we may need to save them in different directory.

amaneth commented 8 months ago

sorry I forgot the datasets are being concatenated:)