complexdatacollective / Architect

A tool that builds Network Canvas interviews.
http://www.networkcanvas.com
GNU General Public License v3.0
4 stars 4 forks source link

Case sensitivity of Name and name creates issues with Rosters and validation #785

Closed berniehogan closed 2 years ago

berniehogan commented 2 years ago

The validation of a node variable takes into account all nodes with that variable. However, nodes that have been included from a roster could have the same variable with a different case.

To test this, import a node from a csv with Name as header. Create a name generator with "name" (as recommended) as the name. Then when testing the "unique" validation it searches only through the nodes created with the name generator that used "name" and not among nodes that were imported where the CSV resource uses "Name".

Unsure the proposal as this could get deep. But my hunch is that we should add variables with ambiguous type when we add a resource.

Further and related, what happens when we create other variables (perhaps even with a different data typ) but the same name as columns from the resource (untested; export and reading in other software untested)?

jthrilly commented 2 years ago

I'm going to close this one as "by design", because the intent with the roster parsing is for the format to exactly match the codebook. We actually designed it so that the output of an interview could be used as a roster for a second interview, based on the same protocol.

Architect prevents you from creating a variable that is the same as another apart from case, so the only scenario where this could occur would be where the roster is from somewhere external. In this case you would either make your data conform to the codebook format, or update the codebook variable names to match your source.

jthrilly commented 2 years ago

One other thing to note is that adding a network resource doesn't actually create any variables. What happens is that as a node is imported, Interviewer attempts to match the attributes with codebook variables. This happens based on their name and the type they are presumed to be (which is based on parsing all the rows, and making a best guess).

Anything not in the codebook is brought over still, but will be encoded in a general-purpose way. It is up to the user to make the variable and column names match up.

berniehogan commented 2 years ago

If it's true that this is the behaviour that is expected, then it is unduly punishing of the user (and inconsistent) since it respects the case collision elsewhere in the program. For example, I cannot create a "Name" and "name" variable for my nodes as the program communicates to me that they are equivalent.

So if that's the case, then the validator should similarly look for the case insensitive set of variables to validate against or otherwise give the user a means to discriminate. One way might be to edit the variables - I tried that and failed. I went to the resource library to edit the node variable "name" so that it could match the resource. It prevented me from doing so.

jthrilly commented 2 years ago

I think it is pretty consistent that column names in roster data files should match codebook variable names. This could do with more documentation, though.

Specifically, this page needs to be written: https://documentation.networkcanvas.com/how-to/importing-roster-data/