anniesch / jtt

Code for "Just Train Twice: Improving Group Robustness without Training Group Information"
66 stars 16 forks source link

metadata.csv #1

Closed hlml closed 3 years ago

hlml commented 3 years ago

Hi, thanks for the great work! It seems that metadata.csv is missing, is there a place to download them or generate them?

anniesch commented 3 years ago

Hi, thanks for your question! For Waterbirds, the metadata.csv file is included in the directory when you download the dataset here. The other datasets also expect a csv file with metadata, which should also be automatically downloaded when downloading the dataset. Let me know if more clarification on this would be helpful!

hlml commented 3 years ago

Thanks Annie! Looks like some additional processing is needed to convert the metadata files from celebA into one metadata.csv file. I was able to make it work, thanks!

Haoxiang-Wang commented 2 years ago

For celebA, there is no metadata.csv file. Could you directly provide that? @anniesch

anniesch commented 2 years ago

Hi @Haoxiang-Wang, a metadata.csv file is not expected for CelebA. From the "Downloading Datasets" portion of the README, the files/folders that the code expects are data/list_eval_partition.csv, data/list_attr_celeba.csv, and data/img_align_celeba/ for CelebA.

Haoxiang-Wang commented 2 years ago

That is not correct. Your dataset script data/celebA_dataset.py assumes there is a metadata.csv, which should be a merged CSV file of the native two CSV files in CelebA dataset. Also, the bash command generated by your generate_downstream.py also assumes there is a metadata.csv file. @anniesch

anniesch commented 2 years ago

Ah ok, I will add that to the README. For now, I think just a simple call to pd.merge(df1, df2) and df.to_csv('metadata.csv') should be able to merge the two native CSV files. Let me know if you have issues with that.

shoaibahmed commented 2 years ago

Just add the following code after line # 30 in data/celebA_dataset.py after setting the model_type.

df1 = pd.read_csv(os.path.join(self.root_dir, "data", "list_eval_partition.csv"))
df2 = pd.read_csv(os.path.join(self.root_dir, "data", "list_attr_celeba.csv"))
merged_df = pd.merge(df1, df2)
merged_df["split"] = merged_df["partition"]
merged_df.to_csv(os.path.join(self.root_dir, "data", metadata_csv_name), header=True, index=False)
molereddy commented 1 year ago

How do I get the metadata.csv file for MultiNLI?