complexdatacollective / Server

A tool for storing, analyzing, and exporting Network Canvas interview data.
http://networkcanvas.com/
GNU General Public License v3.0
2 stars 2 forks source link

Entity resolution v1 reintegration #325

Closed wwqrd closed 3 years ago

wwqrd commented 3 years ago

This takes the modules developed in the original development of entity resolution and re-integrates them with Server.

Entity Resolution now takes place on it's own screen, not as part of the Export Screen.

The backend could do with some more unit tests, but I think this is in a good place for beta/developer testing.

The sample repo has been heavily updated to give some guidance on how to use this feature along with an example script. https://github.com/complexdatacollective/entity-resolution-sample. A compatible example protocol is specified there: https://github.com/complexdatacollective/entity-resolution-sample/tree/master/protocol.

Feedback

wwqrd commented 3 years ago

There's a work in progress readme in the entity sample doc, also outlines the IO, probably needs some more elaboration: https://github.com/complexdatacollective/entity-resolution-sample/blob/master/README.md

wwqrd commented 3 years ago

case ID and session UUID aren't correct (nc:caseId="AsiZrUAiAP1H7KlW" nc:sessionUUID="entity_resolution"). Case ID should be "Entity Resolution", and session UUID should be a uuid.

Any thoughts what the UUID should be? Using the ID of the exported resolution currently.

wwqrd commented 3 years ago

Ego variables and alter type variables appear separately, although ego is 'cast' to an alter type

This is by design, probably best to have some sort of meeting to discuss.

edit: The short version is that non resolved egos aren't included in the network. We recently decided to add these "orphan" egos back in as their own node type.

jthrilly commented 3 years ago

Any thoughts what the UUID should be? Using the ID of the exported resolution currently.

It should be a new UUID, IMO. This is a new session conceptually.

Ego variables and alter type variables appear separately, although ego is 'cast' to an alter type

This is by design, probably best to have some sort of meeting to discuss.

Yeah, I see where you're coming from. I suppose what confuses me is that during the resolution, the user sees ego cast as a "node". So say both ego and 'person' have a 'name' attribute. Wouldn't it make sense for the user to be able to resolve what the "name" of the resolved node actually is? Right now, they can only have both the name from the 'person' node AND the name on the ego node, without merging them.

wwqrd commented 3 years ago

It should be a new UUID, IMO. This is a new session conceptually.

Okay I'll have it generate one when the resolution is saved, that way it can be a proper uuid, as well as being reproducible.

wwqrd commented 3 years ago

BUG: removing the resolution and then running another resolution shows the old number of resolved nodes in the summary

I'm not able to reproduce this one? The resolutions are listed date descending, might be that?

wwqrd commented 3 years ago

There is an orphan data node right after the graph element: undefined. Suspect this is a network-exporters bug due to having no ego entity

This is fixed by https://github.com/complexdatacollective/network-exporters/pull/27

jthrilly commented 3 years ago

Looks like the CI build error has been encountered by others: https://stackoverflow.com/questions/66331991/attempted-import-error-default-is-not-exported-from-assertthisinitialized