VirtualFlyBrain / curation

A repository of records specifying curation into the VFB Knowledge Base
0 stars 0 forks source link

Fix short_form discrepancies #66

Open matentzn opened 4 years ago

matentzn commented 4 years ago

Some of the entities in KB have non-derivable short_forms, that is: KB uses a short_form that cannot be derived again by pdb. Now this may or may not be a problem, but at least I wanted to point it out (I noticed that some datasets in p2.pdb have different short_forms than pdb1).

MATCH (n:Entity) WHERE not n.iri =~ ('.*' + n.short_form) RETURN n.iri, n.short_form LIMIT 250

Results in 40 entities with such discrepancies. To fix this, you would have to:

  1. Fix the site: issue in general (URLs don't always have short forms)
  2. Fix some of the entities manually (for example datasets), by changing the short form to the respective IRI fragment.

Example:

"http://virtualflybrain.org/data/Aso_Rubin_2016" | "AsoRubin2016"
Robbie1977 commented 4 years ago

@dosumis This is also affecting DOI pub nodes:

KB: short_form:doi_10_1101_376384 
PDB.p2: short_form:376384 

Note: Irrespective of if the short form uses new '__ = /' format or not. We should standardise whatever we do.

dosumis commented 4 years ago

I wonder why there's truncation on _. The only characters that should be used for short_form derivation are / & #.

matentzn commented 4 years ago

Remeber the iri has a / there - that is probably why.

dosumis commented 4 years ago

You beat me to it. Just realised that.