Closed pipliggins closed 1 year ago
This can be done with the current adtl functionality, here is a minimal working example:
long.csv
:
case_id,field,value
1,sex,1
1,age,20
2,sex,2
2,age,25
With the following TOML file long.toml
:
[adtl]
name = "long"
description = "Convert a long table to wide"
[adtl.tables]
cases = { kind = "groupBy", groupBy = "id", aggregation = "lastNotNull" }
[adtl.defs."Y/N/NK".values]
1 = true
2 = false
[cases]
pathogen = "COVID-19"
[cases.id]
field = "case_id"
[cases.sex_at_birth]
field = "value"
description = "Sex at Birth"
values = { 1 = "male", 2 = "female", 3 = "non_binary" }
if = { field = "sex" }
[cases.age]
field = "value"
description = "Age"
if = { field = "age" }
Running adtl long.toml long.csv
should produce long-cases.csv
:
age,id,pathogen,sex_at_birth
20,1,COVID-19,male
25,2,COVID-19,female
Ah okay, missed this! I'll have a go with the Italian CORE data.
There's an issue here with combinedType data being overwritten as rows are iterated through. Haven't quite pinned down what's happening, but e.g. ethnicity is always finally returned as [None], despite data being present, and found initially.
Also - we're going to have to revisit linking different rows according to a field. The date for an observation will obviously be on a different row, and therefore can't be found on a single pass - they're linked by the 'PROGRESSIVE_DAILY' column.
As long data only happens once, we'll transform the data rather than adding a new table type.
Italian CORE data is in long, rather than wide, form. We will need a new table kind for this. Data format: ID | Form section | Day number | Field name | Value