globalgov / manystates

Many data on sovereign states
https://globalgov.github.io/manystates
Creative Commons Attribution 4.0 International
4 stars 1 forks source link

Make sure variables are in standard unambiguous format for `consolidate()` to run properly #46

Closed henriquesposito closed 2 years ago

henriquesposito commented 2 years ago

The consolidate() function assumes that variables with the same name across datasets in the same database should be in the same class/format, otherwise it trows an error. However, it identify variables by a common prefix to resolve them. For example, in the states database in the development version of the manystates package we have different variable that start with the same prefix "into" (i.e. into and intodDate at the ICOW dataset) and that have different classes. Thus, consolidate() understands these as the same variable and since they have a different class, we cannot resolve these... Therefore we need to rename these so that they have different prefixes in their name if we are to keep both variables in the dataset/database.

We also have an issue with the "key" variable for some of the databases in manystates. For example, the dataset in the states database only have one variable in common ("label") that can be the "key" variable for consolidate(). We need to make sure that this variable is good and not too ambiguous. I would perhaps suggest we find at least one more "key" variable, maybe "COW_Nr".

@BBieri and @jaeltan could you please take a look at these variable level issues of consistency and make sure they are solved across the databases before we get a new release for manystates? Thank you!

BBieri commented 2 years ago

Closing this issue following progress by @henriquesposito on the consolidate() function dealing with ambiguous variables.