Wesleyan-Media-Project / creative_overview

An overview of all repos belonging to the CREATIVE project
Other
0 stars 0 forks source link

Large 2020 Training Data #18

Open furkmak opened 4 months ago

furkmak commented 4 months ago

As part of our classification repos, we would need to provide any training data we use, including data from 2020 cycle. Harry, Aleks and I will go through each of the following repos to note if we use any 2020 data for training that we need to upload to Figshare due to its size (larger than 100 mb):

To reiterate, we do not need to worry about any 2020 training data that are already uploaded to GitHub. We just need to check if a training script uses a 2020 training data that is not readily available on GitHub. If you find any files that we need to know, just make a comment here once you go through all repos above. Feel free to share the workload as you see fit.

atlasharry commented 4 months ago

I have gone through the first five repos:

1) Entity Linking 2022 did not use any 2020 training data 2) Race of Focus uses

In conclusion, those 5 repos only use 2 of the main output data from fb_2020: fb_2020_140m_adid_var1.csv.gz and fb_2020_140m_adid_text_clean.csv.gz. The link to each file above would also redirect to the files and lines where the file was mentioned in the code.

a-jacewicz commented 4 months ago
  1. For Attack like, fb_2020_140m_adid_var.csv.gz and fb_2020_140m_adid_text_clean.csv.gz are used. fb_2020_140m_adid_text_clean.csv.gz specifically is used in several of the scripts in the code folder.

  2. For Issue Classifier, there isn’t any 2020 data we need to upload.

  3. For Ad_Tone, fb_2020/fb_2020_140m_adid_var1.csv.gz is used.

  4. For Ad_Goal_Classifier, fb_2020/fb_2020_140m_adid_text_clean.csv.gz is used.

Overall, these 4 repos use fb_2020_140m_adid_var.csv.gz, fb_2020_140m_adid_text_clean.csv.gz, and fb_2020/fb_2020_140m_adid_var1.csv.gz.

furkmak commented 4 months ago

@sheoftensaid will help us upload these files and @atlasharry and @a-jacewicz can use the links she provided to update the Readmes

SebastianZimmeck commented 4 months ago

@sheoftensaid, if you can alert once the links are available/you added them.