complexdatacollective / Server

A tool for storing, analyzing, and exporting Network Canvas interview data.
http://networkcanvas.com/
GNU General Public License v3.0
2 stars 2 forks source link

use codebook option values for categorical variables #337

Closed rebeccamadsen closed 2 years ago

rebeccamadsen commented 2 years ago

Fixes #336.

The issue here turned out to be on the import side of the pipeline. We use graphml attr.type to determine how to process values, whether they need to be cast as numbers or something. But categorical variables are of type boolean in graphml because we format them as variableName_optionValue. When we parse the underscores to pull out the optionValue, we then were left with a string. This is fine, except when option values are numerical. (e.g. a variable named ind_K6a_3 would have a value of "3" when it should be 3)

In this fix, I opted to leverage the codebook and lean on the codebook option value so that we have type correct, since the codebook is the definitive source.

The secondary mystery here is that charts seemed fully functional either way. It turns out we've been casting option values as strings all along:

https://github.com/complexdatacollective/Server/blob/master/src/renderer/containers/workspace/withAnswerDistributionCharts.js#L49

So do we fix this and not cast them, or leave it because it's not harming anything? Removing the toString here makes the charts correct, but if it isn't harming anything, then are users currently seeing what they expect? Or would fixing this alert them that they need to re-import some graphmls?

jthrilly commented 2 years ago

Thanks for another great find and fix @rebeccamadsen!

Am I right in thinking this means that data has not been imported correctly for some existing users in some cases? Is there anything we can do to fix existing data/prevent data loss?

So do we fix this and not cast them, or leave it because it's not harming anything? Removing the toString here makes the charts correct, but if it isn't harming anything, then are users currently seeing what they expect? Or would fixing this alert them that they need to re-import some graphmls?

Assuming I am following correctly, casting as a string in the graphs is just masking the issue, and we shouldn't do it. So I would suggest we remove the casting here.

rebeccamadsen commented 2 years ago

Yes, this would mean data has not been imported correctly for some existing users. I'll take a look and see if we can pull in a fix for that scenario. But that data doesn't look any different than any other once it's been imported, which would mean checking all the data for consistent types every time, or maybe a menu option to have corrections be user requested?? I'll have to consider other options.

I'll remove the casting.

jthrilly commented 2 years ago

I think a one off "fix" that could be run optionally from a menu or otherwise would be great.

rebeccamadsen commented 2 years ago

A fix to add a menu option has been included here.