MachineVisionUiB / machinevision

We are developing a database to map and interpret the representations and uses of machine vision technologies in digital art, computer games and narratives such as science fiction novels, movies and creepypasta.
http://uib.no/en/machinevision
4 stars 0 forks source link

Taxonomy classes with identical ID numbers #156

Closed LindaKairus closed 2 years ago

LindaKairus commented 2 years ago

Working both with an website export with situations and also with the work_long_v2.csv e.g. verbs and characters or works and topics have the identical ID numbers. This causes problems when importing the data into Gephi. Apparently the same problem occurred earlier and Jill wrote "we were adding numbers BEFORE the node ID". Then this needs to be done for the final exports as well.

Screenshot 2021-12-02 at 11 47 07 Screenshot 2021-12-02 at 10 51 14 Screenshot 2021-12-02 at 11 14 09 Screenshot 2021-12-02 at 22 14 36
steinmb commented 2 years ago

Thank you for testing the data export :)

we were adding numbers BEFORE the node ID

Not sure what she mean. What number is she referring to?

Snippet from work_long_v2.csv

Title ID Year Country Genre Genre ID Technologies referenced Technologies referenced id Technologies used Technologies used id Topic Topic id Sentiment Sentiment id Situation machine vision is used in Situation machine vision is used in ID Characters Characters ID
SEER: Simulative Emotional Expression Robot 1626 2018 Japan Art 2 AI (General Purpose Artificial Intelligence) 152 Emotion recognition 91 Playful 170 Exciting 97 SEER: Simulative Emotional Expression Robot (mimicing facial expressions) 1628 SEER Robot 1627
steinmb commented 2 years ago

Ah. I think I understand. We where padding the node-id with xxx to make sure they where unique?

jilltxt commented 2 years ago

Yes, that's right! I think we put 900 as a prefix to situations, 800 to verbs etc - or something like that.

steinmb commented 2 years ago

commit MachineVisionUiB/machinevision_config@88f84b1a6d54c0026da939556734f7772caacd09 (HEAD -> master, origin/master, origin/HEAD) Author: Stein Magne Bjorklund steinmb@smbjorklund.com Date: Wed Dec 8 12:14:46 2021 +0100

Issue 156 Avoid taxonomy term id collision with node id

Separate entity types like taxonomy and node use same number serie
as, for them, unique id. When we expose id from both of them in
a data export they might be naturly be identical.

To avoid that is all creative work taxonomy term id padded with
the value 2000. All node id are below 10000.

Updated views are found at:

Currently running new exports, but have a look at pages above. They reflect the final data export.

@jilltxt and me also discussed using the UUID string instead of the this integer. That will give us a real unique ID and we do not have to do rewrites of the read id like this for every taxonomy term id. It is long though that is why is it unique both for teh Machine vision project but also for any other object on the planet earth. Gephy might not like a string as a id though we do not know before we try. It is long and sorting by them in Excel (do we ever do that) is impossible.

steinmb commented 2 years ago

Found one display that was missing the padding:

commit MachineVisionUiB/machinevision_config@77a60e18b2bba37d3961c9fa958dbe1d67c12f66 Author: Stein Magne Bjorklund steinmb@smbjorklund.com Date: Wed Dec 8 12:40:44 2021 +0100

Issue 156 Creative work wide export tax term padding missing
jilltxt commented 2 years ago

I think this is now fixed? Looks OK to me. Thanks!