Open furkmak opened 4 months ago
I have gone through the first five repos:
1) Entity Linking 2022 did not use any 2020 training data 2) Race of Focus uses
In conclusion, those 5 repos only use 2 of the main output data from fb_2020: fb_2020_140m_adid_var1.csv.gz
and fb_2020_140m_adid_text_clean.csv.gz
. The link to each file above would also redirect to the files and lines where the file was mentioned in the code.
For Attack like, fb_2020_140m_adid_var.csv.gz and fb_2020_140m_adid_text_clean.csv.gz are used. fb_2020_140m_adid_text_clean.csv.gz specifically is used in several of the scripts in the code folder.
For Issue Classifier, there isn’t any 2020 data we need to upload.
For Ad_Tone, fb_2020/fb_2020_140m_adid_var1.csv.gz is used.
For Ad_Goal_Classifier, fb_2020/fb_2020_140m_adid_text_clean.csv.gz is used.
Overall, these 4 repos use fb_2020_140m_adid_var.csv.gz
, fb_2020_140m_adid_text_clean.csv.gz
, and fb_2020/fb_2020_140m_adid_var1.csv.gz.
@sheoftensaid will help us upload these files and @atlasharry and @a-jacewicz can use the links she provided to update the Readmes
@sheoftensaid, if you can alert once the links are available/you added them.
As part of our classification repos, we would need to provide any training data we use, including data from 2020 cycle. Harry, Aleks and I will go through each of the following repos to note if we use any 2020 data for training that we need to upload to Figshare due to its size (larger than 100 mb):
To reiterate, we do not need to worry about any 2020 training data that are already uploaded to GitHub. We just need to check if a training script uses a 2020 training data that is not readily available on GitHub. If you find any files that we need to know, just make a comment here once you go through all repos above. Feel free to share the workload as you see fit.