Closed hlml closed 3 years ago
Hi, thanks for your question! For Waterbirds, the metadata.csv file is included in the directory when you download the dataset here. The other datasets also expect a csv file with metadata, which should also be automatically downloaded when downloading the dataset. Let me know if more clarification on this would be helpful!
Thanks Annie! Looks like some additional processing is needed to convert the metadata files from celebA into one metadata.csv file. I was able to make it work, thanks!
For celebA, there is no metadata.csv
file. Could you directly provide that? @anniesch
Hi @Haoxiang-Wang, a metadata.csv
file is not expected for CelebA. From the "Downloading Datasets" portion of the README, the files/folders that the code expects are data/list_eval_partition.csv, data/list_attr_celeba.csv, and data/img_align_celeba/ for CelebA.
That is not correct. Your dataset script data/celebA_dataset.py assumes there is a metadata.csv, which should be a merged CSV file of the native two CSV files in CelebA dataset. Also, the bash command generated by your generate_downstream.py also assumes there is a metadata.csv file. @anniesch
Ah ok, I will add that to the README. For now, I think just a simple call to pd.merge(df1, df2)
and df.to_csv('metadata.csv')
should be able to merge the two native CSV files. Let me know if you have issues with that.
Just add the following code after line # 30 in data/celebA_dataset.py
after setting the model_type.
df1 = pd.read_csv(os.path.join(self.root_dir, "data", "list_eval_partition.csv"))
df2 = pd.read_csv(os.path.join(self.root_dir, "data", "list_attr_celeba.csv"))
merged_df = pd.merge(df1, df2)
merged_df["split"] = merged_df["partition"]
merged_df.to_csv(os.path.join(self.root_dir, "data", metadata_csv_name), header=True, index=False)
How do I get the metadata.csv file for MultiNLI?
Hi, thanks for the great work! It seems that metadata.csv is missing, is there a place to download them or generate them?