gbif / model-tests

Exploration of sample models
2 stars 0 forks source link

Duplicated identifiers #19

Closed MortenHofft closed 2 years ago

MortenHofft commented 2 years ago

I'm trying to replicate https://arctos.database.museum/guid/DMNS:Mamm:11098

I notice that the identifiers for the parasite relationship appears 2 places

I can ask for the identifiers for the entity entityIdentifiersByEntityId But I can also ask for entityRelationshipsBySubjectEntityId and get the externalObjectEntityId value. Both will point me to e.g. http://arctos.database.museum/guid/DMNS:Para:64.

It looks to me like it shouldn't be in the identifiers table, but only in the relationsships

query {
  # the entity can have many IDs, so we need to ask for the entity through an entity identifiers table
  specimensIDs: allEntityIdentifiers(condition: {
    entityIdentifier: "https://arctos.database.museum/guid/DMNS:Mamm:11098"
  }) {
    # there should only be one entity with this ID
    nodes {
      entityId
      entityIdentifier
      entityIdentifierType
      specimen: entityByEntityId {
        entityId
        entityType

        # Identifiers section 
        # This is missing 2/5 columns: "relationsship" and "ID value" and "assignedBy" is missing. But might be inferred from the others?
        entityIdentifiersByEntityId {
          nodes {
            entityIdentifier
            entityIdentifierType
          }
        }
        # above identifiers include the parasites
        # I can also get to those via the related entities path like below
        entityRelationshipsBySubjectEntityId {
          nodes {
            externalObjectEntityId
          }
        }

      }
    }
  }
}
tucotuco commented 2 years ago

I'm not sure I understand the issue expressed here. The Identifiers table should only represent relationships to self, whereas the EntityRelationships would take care of relationships to other things.

On Wed, Jun 15, 2022 at 8:06 AM Morten Høfft @.***> wrote:

I notice that the identifiers for the parasite relationship appears 2 places

I can ask for the identifiers for the entity entityIdentifiersByEntityId But I can also ask for entityRelationshipsBySubjectEntityId and get the externalObjectEntityId value. Both will point me to e.g. http://arctos.database.museum/guid/DMNS:Para:64.

query {

the entity can have many IDs, so we need to ask for the entity through an entity identifiers table

specimensIDs: allEntityIdentifiers(condition: { entityIdentifier: "https://arctos.database.museum/guid/DMNS:Mamm:11098" }) {

there should only be one entity with this ID

nodes {
  entityId
  entityIdentifier
  entityIdentifierType
  specimen: entityByEntityId {
    entityId
    entityType

    # Identifiers section
    # This is missing 2/5 columns: "relationsship" and "ID value" and "assignedBy" is missing. But might be inferred from the others?
    entityIdentifiersByEntityId {
      nodes {
        entityIdentifier
        entityIdentifierType
      }
    }
    # above identifiers include the parasites
    # I can also get to those via the related entities path like below
    entityRelationshipsBySubjectEntityId {
      nodes {
        externalObjectEntityId
      }
    }

  }
}

} }

— Reply to this email directly, view it on GitHub https://github.com/timrobertson100/model-tests/issues/19, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72663LXRIJOFWKH3B2DVPG2LRANCNFSM5Y24H2LA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

MortenHofft commented 2 years ago
https://github.com/timrobertson100/model-tests/blob/c64965fc18b7c0df150133d44b97fbe4949d717e/arctos/files/entity_identifier.csv line entityID entityIdentifier entityIdentifierType
 4 21714980 https://arctos.database.museum/guid/DMNS:Mamm:11098 Arctos object
22 21714980 http://arctos.database.museum/guid/DMNS:Para:664 DMNS:Para

The Identifiers table should only represent relationships to self

I'm not sure I get that. entity 21714980 has 2 identifiers that seem to point to different organisms? The types are different though, making me think that the type was intended to indicate the type of relationship (that one was a parasite of the other). But since that also appears in the relationship table it was confused https://github.com/timrobertson100/model-tests/blob/4a6ef6f4c9fe3e76238ae51fde985d7097427463/arctos/files/entity_relationship.csv

timrobertson100 commented 2 years ago

Isn't this just a data mistake in the CSV, where http://arctos.database.museum/guid/DMNS:Para:664 is being added to the wrong entity?

MortenHofft commented 2 years ago

That is the question I'm posing yes.

It looks to me like it shouldn't be in the identifiers table, but only in the relationsships

tucotuco commented 2 years ago

Those references to other entities are spurious here. The belong only in the EntityRelationship. Fixed with https://github.com/timrobertson100/model-tests/commit/37f212580606944a29b5c1da116eff4b88c5386e.