amzn / faithful-data2text-cycle-training

Apache License 2.0
9 stars 3 forks source link

data and text file format #3

Open jessicalundin opened 10 months ago

jessicalundin commented 10 months ago

Thank you for writing a great paper. Doing more with less data is always a helpful contribution to the research community.

What is the file format for parameters text_file and data_file? I'm new to working with these datasets, maybe there is a standard format. Many thanks!

Here is what I tried to create the files from datasets import load_dataset dataset = load_dataset("web_nlg", "release_v1") with open("text.txt", "w") as file: json.dump({"text": text}, file) with open("data.txt", "w") as file: json.dump({"text": data}, file)

This runs with following command, but the results are not as expected.
python cycle_training.py --text_file text.txt --data_file data.txt --output_dir output --do_train --text2data_model google/flan-t5-base --data2text_model google/flan-t5-base

Edillower commented 10 months ago

A plain txt file would do the job. See the FAQ section at https://github.com/Edillower/CycleNLG for preprocessing guidance and sample input data. Feel free to let me know if you have any further questions.