Closed kodecharlie closed 5 years ago
Looking at the .ttl
file output: you need to include the headers so that the various prefixes parse. You can cut'n'paste from other files (e.g. corrections.ttl
). And, any URI that isn't collapsed with a prefix has to be surrounded by angle brackets, i.e. <http://whatever>
not http://whatever
.
Try loading the output into any parser and you should get a list of complaints. You'll also need to escape-out the descriptions manually, but a simple string find & replace may be good enough (the descriptions usually aren't diabolical, and the robustness criteria for this aren't super high).
Also: I think having the code within the axioms
package makes sense for various soft reasons (like the lack of anywhere better to put it), but calling the pairing file data/repair/axiom_cellpairs.json
is a bit confusing, because it's a general purpose fix, not specific to axioms. That being said we might deprecate the file, so if that's the case, no need to worry about it.
@aclarkxyz -- I believe these latest changes address most of your concerns. Note that any BRENDA description ported to a CLO term will be escaped in accordance with W3C specs. However, we still do not write the term hierarchy if a CLO term is written (ie, commented out) that needs a description.
This is the cell line outcome for beta currently:
Note that the CLO is quite small because it's the original BAO-curated selection. BRENDA has 3279 items.
The new one invokes the CLO directly (or at least the categories we're interested in):
Note that the # of entries for the BRENDA section is now 2807, this being because the duplicates were migrated over into the CLO hierarchy (see data/ontology/cellmerge.ttl
). In these instances, the CLO has borrowed the description from BRENDA (e.g. see HT-1080).
Compile changes in the normal way:
Then run
CellLineFix
like so:where
/tmp/z.txt
houses candidate matches between CLO and BRENDA cell lines, and the file~/tmp/cell-line-fixes.ttl
contains Turtle-style directives for amending the relevant trees.Take a look at:
handleMatchedTerms
handleUnmatchedBRENDATerm
handleUnmatchedCLOTerm
Alex, I've tried to incorporate your key points into these methods. See the
XXX
near the top of the classCellLineFix
. I'm not sure what we should do for the CLO identifiers for the interim / artificial branches that house BRENDA cell / tissue terms.