CancerRegistryOfNorway / nordcanpreprocessing

Other
0 stars 0 forks source link

grade vs. icdo3_grade #14

Closed CotterpinDoozer closed 4 years ago

CotterpinDoozer commented 4 years ago

Preprocessing searches for the variable "icdo3_grade" in the dataset. However, call for data specifies that the name of this variable should be "grade", so we need to change the module to look for the correct variable to avoid the error "the following optional columns were not found in the data: "icdo3_grade"; the tool will still probably work, but in a limited manner"

WetRobot commented 4 years ago

"icdo3_grade" only appears in a message because an R package not specific to NORDCAN (https://github.com/WetRobot/iarccrgtools) requires it. our NORDCAN-specific systems required "grade". The message

"the following optional columns were not found in the data: "icdo3_grade"; the tool will still probably work, but in a limited manner"

is not an error but really a note. But I'll think about whether there's something that can be done here.

CotterpinDoozer commented 4 years ago

The easiest way to fix this is probably that I change the name of the variable in the call for data to "idco3_grade" so that the R-package you made for IARCtools (that I suppose you need for other projects as well) understand that we have grading included. I guess changing from "icdo3_grade" to "grade" in your IARCtools R-package causes more trouble?

WetRobot commented 4 years ago

we change grade to icdo3_grade internally so there's no need to change the call for data nor any NORDCAN-specific code to this end. I think the only issue right now is that when package iarccrgtools emits the message, it can confuse people.

CotterpinDoozer commented 4 years ago

Ok, so theoretically you could change the IARCtools-package to ask for "grade" instead of "icdo3_grade". I don't think IARCcrgTools cares what the column is named.

I am aware that this is just a notice to the user, and I didn't find it confusing (maybe because we already discussed it), and it is nice to say to the user that "well, you don't have grade in your dataset, but that's ok - no big deal". My agenda was only that IF the user had "grade" in the dataset, we could actually extract it to the IARCtools instead of saying to the user that the user doesn't have it.

WetRobot commented 4 years ago

Oh. Well, not every country has grade, so if grade is used if it actually exists, then we are then doing different pre-processing depending on the country. I would vote against that.

CotterpinDoozer commented 4 years ago

Ok, I see that point, that's ok with me. You can then reject this task.

We have already stated in the Call for Data that "grade" is only for JRC, so we don't really need to care more about this. I think the main reason for confusion was the error messages Charlotte and Marnar got, but we have solved those and they were not about grade.