CancerRegistryOfNorway / nordcanpreprocessing

Other
0 stars 0 forks source link

Assertions on the number of characters per column #16

Closed WetRobot closed 3 years ago

WetRobot commented 4 years ago

E.g. pat should have at most 15 characters.

It seems columns with too many characters are causing problems in pkg iarccrgtools.

CotterpinDoozer commented 4 years ago

I have already changed this in call for data. I have also changed so that it needs to be all numbers, no letters or other characters.

the-r-man commented 4 years ago

It seems to differ between the users. For instance, for me and Marnar 10 characters was the limit, 11 was not working. Siri has mentioned that in Norway it worked with more characters.

WetRobot commented 4 years ago

@HuidongTian resolving this issue requires making changes in nordcancore on the column specifications (https://github.com/CancerRegistryOfNorway/nordcancore/blob/master/data-raw/column_specifications.R) and then testing that too long character columns (such as pat) are not allowed. You should probably create a new format for strings of a specific maximum length, e.g. String15.

After making the changes in column_specifications.R, see https://github.com/CancerRegistryOfNorway/nordcancore/blob/master/data-raw/sysdata.R. The internal datasets are used in https://github.com/CancerRegistryOfNorway/nordcancore/blob/master/R/column_specifications.R.

Format-specific checks are implemented in https://github.com/CancerRegistryOfNorway/nordcanpreprocessing/blob/master/R/column_checks.R.

HuidongTian commented 4 years ago

I plan to add one more argument for String type, like 'max_char', which will specify the max length of characters for string format. So, what's the max number limit for String? Is there any difference between different string type columns? like the max number of characters for 'pat' is 11 (by the way, which number is set for 'pat', 15, 11, 10?), while for some other string column is 50? Currently, the max length for string is 50.

CotterpinDoozer commented 4 years ago

All max numbers for each variable is given in the call for data - in the "Format/Size"-column.

Here is for incidence: https://github.com/CancerRegistryOfNorway/NORDCAN/wiki/Call-for-data---Incidence Here is for mortality: https://github.com/CancerRegistryOfNorway/NORDCAN/wiki/Call-for-data---Mortality

HuidongTian commented 3 years ago

closes #16