forrtproject / FReD-data

0 stars 0 forks source link

Discuss procedure for auto-coding variables during data release #13

Open LukasRoeseler opened 1 week ago

LukasRoeseler commented 1 week ago

Original journal names need to be coded for some entries and issns should be coded for all entries. This is possible using the rcrossref package but only works if the original article has a DOI. We should discuss what other variables we code and how to go about the coding (e.g. have it coded every time the dataset is updated or less often).

LukasWallrich commented 1 week ago

if we format references properly or use an LLM we can also extract journal names from references.

More broadly, agree on coding during data release, also for effect size conversion.

LukasWallrich commented 1 week ago

Also, move some data validation and cleaning into that script - e.g., ensure that doi never have prefixes, and that they are trimmed (no trailing whitespace)