biolink / kgx

KGX is a Python library for exchanging Knowledge Graphs
https://kgx.readthedocs.io
BSD 3-Clause "New" or "Revised" License
116 stars 27 forks source link

How to handle BNodes in workflow? #107

Closed lhannest closed 4 years ago

lhannest commented 5 years ago

There are 6,222 nodes in the Neo4j database with BNODE as their id prefix, for example:

name    germline variant of NT5C2
provided_by    [orphanet.ttl]
id    BNODE:bb7f86ea9b5ffee1
category    [sequence variant]

@deepakunni3 suggested that the presence of BNodes was a temporary solution that we should now aim to fix with KGX. @cmungall how should I handle these nodes in the workflow I'm developing?

cmungall commented 5 years ago

blank nodes should be excluded. But it's good to see what we are missing and feedback to dipper team - what is connected by this bnode?

On Wed, Mar 13, 2019 at 1:32 PM Lance Hannestad notifications@github.com wrote:

There are 6,222 nodes in the Neo4j database with BNODE as their id prefix, for example:

name germline variant of NT5C2 provided_by [orphanet.ttl] id BNODE:bb7f86ea9b5ffee1 category [sequence variant]

@deepakunni3 https://github.com/deepakunni3 suggested that the presence of BNodes was a temporary solution that we should now aim to fix with KGX. @cmungall https://github.com/cmungall how should I handle these nodes in the workflow I'm developing?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCATS-Tangerine/kgx/issues/107, or mute the thread https://github.com/notifications/unsubscribe-auth/AADGOSZlx1l61eqeh3wc5_2SlDKVyCwLks5vWTT_gaJpZM4btznl .

lhannest commented 5 years ago

It has just one edge: BNODE:bb7f86ea9b5ffee1 causes_condition MONDO:0013165

>>> match (n) where split(n.id, ':')[0] = 'BNODE' match (n)-[e]-(m) return n.id, type(e), m.id limit 30;
n.id | type(e) | m.id
BNODE:NCBITaxon:9606genome | subclass_of | SO:0001026
BNODE:feature600273 | causes_condition | MONDO:0010856
BNODE:feature614233 | causes_condition | MONDO:0013648
BNODE:feature600202 | contributes_to | MONDO:0010843
BNODE:feature601808 | causes_condition | MONDO:0011147
BNODE:feature600257 | causes_condition | MONDO:0010852
BNODE:feature203650 | causes_condition | MONDO:0021035
BNODE:feature609029 | causes_condition | MONDO:0012176
BNODE:feature190685 | causes_condition | MONDO:0008608
BNODE:feature613544 | causes_condition | MONDO:0013299
BNODE:feature611936 | causes_condition | MONDO:0012761
BNODE:feature612521 | contributes_to | OMIM:612521
BNODE:feature300705 | causes_condition | MONDO:0010406
BNODE:feature611015 | contributes_to | OMIM:611015
BNODE:feature608687 | causes_condition | MONDO:0012098
BNODE:feature612001 | causes_condition | MONDO:0012774
BNODE:feature300578 | causes_condition | MONDO:0010364
BNODE:feature612513 | causes_condition | MONDO:0012916
BNODE:feature612348 | contributes_to | MONDO:0012872
BNODE:feature215500 | causes_condition | MONDO:0024539
BNODE:feature614541 | causes_condition | MONDO:0013798
BNODE:feature613792 | causes_condition | MONDO:0013424
BNODE:feature610155 | contributes_to | OMIM:610155
BNODE:feature300498 | causes_condition | MONDO:0010344
BNODE:feature600049 | causes_condition | OMIM:600049
BNODE:feature613959 | causes_condition | OMIM:613959
BNODE:feature215400 | contributes_to | MONDO:0008978
BNODE:feature252270 | causes_condition | MONDO:0009646
BNODE:feature617930 | causes_condition | OMIM:617930
BNODE:feature194072 | causes_condition | MONDO:0008681
deepakunni3 commented 5 years ago

Here is a snippet from the TTL from which we got BNODE:bb7f86ea9b5ffee1,

<https://monarchinitiative.org/.well-known/genid/bb7f86ea9b5ffee1> a OBO:GENO_0000002 ;
    rdfs:label "germline variant of NT5C2" ;
    OBO:GENO_0000418 <http://www.orpha.net/ORDO/Orphanet_398201> ;
    OBO:RO_0003303 <http://www.orpha.net/ORDO/Orphanet_320396> ;
    :MONARCH_anonymous true ;
    :has_cell_origin OBO:GENO_0000900 .

It looks like this entity is a 'variant allele'.

The reason its being turned into a bnode is because of the prefix mapping in the TTL,

@prefix : <https://monarchinitiative.org/> .
...

Is there a recommended way we should be treating these entities?

Also, these IDs are not permanent. It looks like they sometimes change between releases?

Here is a somewhat similar TTL entry (from the latest orphanet.ttl) but the ID is different,

<https://monarchinitiative.org/.well-known/genid/b78b75d23ff1b739b52e> a OBO:GENO_0000002 ;
    rdfs:label "germline variant of NT5C2" ;
    OBO:GENO_0000418 <http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=8022> ;
    OBO:RO_0003303 <http://www.orpha.net/ORDO/Orphanet_320396> ;
    :MONARCH_anonymous true ;
    :has_cell_origin OBO:GENO_0000900 .

I pulled this from orphanet.ttl with timestamp,

orphanet.ttl                                       12-Mar-2019 00:29             7676950

Does the ID change only when a property of that entity changes?

cmungall commented 5 years ago

OK, so these are not true blank nodes, but they are effectively blank nodes. Either way we don't want them (but we want to know what they connect so we can change the dipper modeling to suggest direct connections - let's make sure we make dipper tickets for each of these)

http://bit.ly/monarch-kg-modeling

On Wed, Mar 13, 2019 at 2:59 PM Deepak notifications@github.com wrote:

Here is a snippet from the TTL from which we got BNODE:bb7f86ea9b5ffee1,

https://monarchinitiative.org/.well-known/genid/bb7f86ea9b5ffee1 a OBO:GENO_0000002 ; rdfs:label "germline variant of NT5C2" ; OBO:GENO_0000418 http://www.orpha.net/ORDO/Orphanet_398201 ; OBO:RO_0003303 http://www.orpha.net/ORDO/Orphanet_320396 ; :MONARCH_anonymous true ; :has_cell_origin OBO:GENO_0000900 .

It looks like this entity is a 'variant allele'.

The reason its being turned into a bnode is because of the prefix mapping in the TTL,

@prefix : https://monarchinitiative.org/ . ...

Is there a recommended way we should be treating these entities?

Also, these IDs are not permanent. It looks like they sometimes change between releases?

Here is a somewhat similar TTL entry (from the latest orphanet.ttl) but the ID is different,

https://monarchinitiative.org/.well-known/genid/b78b75d23ff1b739b52e a OBO:GENO_0000002 ; rdfs:label "germline variant of NT5C2" ; OBO:GENO_0000418 http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=8022 ; OBO:RO_0003303 http://www.orpha.net/ORDO/Orphanet_320396 ; :MONARCH_anonymous true ; :has_cell_origin OBO:GENO_0000900 .

I pulled this from orphanet.ttl with timestamp,

orphanet.ttl 12-Mar-2019 00:29 7676950

Does the ID change only when a property of that entity changes?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCATS-Tangerine/kgx/issues/107#issuecomment-472560522, or mute the thread https://github.com/notifications/unsubscribe-auth/AADGOfyvpMSLZ3Xajxn5_3FwKUS0asTjks5vWUp4gaJpZM4btznl .