Closed JonathanJao closed 2 years ago
Thanks @JonathanJao for bringing this to our attention. Let me take a look into this!
Hi @JonathanJao
While I am working on this, could you please kindly provide the following information?
python -c 'import sys; print(sys.getdefaultencoding())'
Also could you please try this fix on your side for me? --
Go to tasks/fewshot_gym_dataset.py
line 6
https://github.com/INK-USC/CrossFit/blob/08e6381e967c065c0d10b99e89dcb9ec2a583d86/tasks/fewshot_gym_dataset.py#L6
Change this line to with open(out_file, "w", encoding="utf-8") as fout:
and rerun the dataset building scripts. Let me know if this works.
Hi thanks for replying! For the details you requested,
> python -c 'import sys; print(sys.getdefaultencoding())'
utf-8
> python --version
Python 3.6.9 :: Anaconda, Inc.
> lsb_release -a
LSB Version: core-9.20170808ubuntu1-noarch:security-9.20170808ubuntu1-noarch
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
I haven't tried the line you sent over yet, but what worked for me seemed to be the following:
> git diff .
diff --git a/tasks/fewshot_gym_dataset.py b/tasks/fewshot_gym_dataset.py
index 5b21d85..75013cc 100644
--- a/tasks/fewshot_gym_dataset.py
+++ b/tasks/fewshot_gym_dataset.py
@@ -5,7 +5,7 @@ class FewshotGymDataset():
def write_to_tsv(self, lst, out_file):
with open(out_file, "w") as fout:
for line in lst:
- fout.write("{}\t{}\n".format(line[0], line[1]))
+ fout.write("{}\t{}\n".format(str(line[0]).encode('utf-8'), str(line[1]).encode('utf-8')))
class FewshotGymClassificationDataset(FewshotGymDataset):
@@ -104,4 +104,4 @@ class FewshotGymTextToTextDataset(FewshotGymDataset):
self.write_to_tsv(k_shot_dev, prefix + "_dev.tsv")
self.write_to_tsv(k_shot_test, prefix + "_test.tsv")
- return k_shot_train, k_shot_dev, k_shot_test
\ No newline at end of file
+ return k_shot_train, k_shot_dev, k_shot_test
Hi, I tried to follow the README, but on certain datasets it gives me a failure when I try to build the gym. As per the README instructions, I've gone in and run some of the tasks python scripts individually but I am met with an encoding error for quite a few of them. For instance when I run
python mocha.py
in the tasks directory, it gives me this error message:or when I run
python anli.py
it gives:This occurs for about half the datasets available from what I can tell, and the other half seem to give no errors and are marked as successes when building the AI gym. Any details on how to fix this would be appreciated!