Decisions regarding how to structure staging tables

Wesleyan-Media-Project / creative_overview

An overview of all repos belonging to the CREATIVE project

Other

1 stars 0 forks source link

Decisions regarding how to structure staging tables #11

Closed Meiqingx closed 11 months ago

Meiqingx commented 1 year ago

Current separation of text table and variable table is a decision based on: 1) potential user needs (researchers who focus on different aspects of political advertising) 2) data size limitations on github 3) incompatible text variable encoding across different operating systems.

g2022_adid_01062021_11082022_text.csv.gz
g2022_adid_01062021_11082022_var1.csv.gz

But we are working on possibilities to better accommodate internal data production.

Meiqingx commented 11 months ago

Due to the limitations listed above, we have kept the same data structure of two separate tables: one table with text fields and another table with non-text values (the two listed above). When we use them for internal data production (esp. running classifiers), I simply merged these two tables into an upstream table for whichever classifier that will take an iteration of these variables as input.