TalwalkarLab / leaf

Leaf: A Benchmark for Federated Settings
BSD 2-Clause "Simplified" License
852 stars 244 forks source link

Preprocessing of sent140 #55

Open Robot-Zhang opened 2 years ago

Robot-Zhang commented 2 years ago

When preprocessing sent140, the intermediate .csv file saved by combine_data.py will have blank lines, causing data_to_json.py to fail to run.

In addition, an error in the encoding format will also be reported.

It is suggested to change line 27 of combine_data.py into the following form:

with open(out_file_name, 'w', encoding='ISO-8859-1', newline='') as f: