cdd / bioassay-template

Other
7 stars 2 forks source link

Issue bae#906: 1st draft of logic that emits semantic directives for folding BRENDA terms into CLO branch in vocabulary. #107

Closed kodecharlie closed 5 years ago

kodecharlie commented 5 years ago

Compile changes in the normal way:

$ ant clean ; ant pkg

Then run CellLineFix like so:

java -cp pkg/BioAssayTemplate.jar com.cdd.bao.axioms.CellLineFix --score 1 --outfile ~/tmp/cell-line-fixes.ttl > /tmp/z.txt

where /tmp/z.txt houses candidate matches between CLO and BRENDA cell lines, and the file ~/tmp/cell-line-fixes.ttl contains Turtle-style directives for amending the relevant trees.

Take a look at:

Alex, I've tried to incorporate your key points into these methods. See the XXX near the top of the class CellLineFix. I'm not sure what we should do for the CLO identifiers for the interim / artificial branches that house BRENDA cell / tissue terms.

aclarkxyz commented 5 years ago

Looking at the .ttl file output: you need to include the headers so that the various prefixes parse. You can cut'n'paste from other files (e.g. corrections.ttl). And, any URI that isn't collapsed with a prefix has to be surrounded by angle brackets, i.e. <http://whatever> not http://whatever.

Try loading the output into any parser and you should get a list of complaints. You'll also need to escape-out the descriptions manually, but a simple string find & replace may be good enough (the descriptions usually aren't diabolical, and the robustness criteria for this aren't super high).

aclarkxyz commented 5 years ago

Also: I think having the code within the axioms package makes sense for various soft reasons (like the lack of anywhere better to put it), but calling the pairing file data/repair/axiom_cellpairs.json is a bit confusing, because it's a general purpose fix, not specific to axioms. That being said we might deprecate the file, so if that's the case, no need to worry about it.

kodecharlie commented 5 years ago

@aclarkxyz -- I believe these latest changes address most of your concerns. Note that any BRENDA description ported to a CLO term will be escaped in accordance with W3C specs. However, we still do not write the term hierarchy if a CLO term is written (ie, commented out) that needs a description.

aclarkxyz commented 5 years ago

This is the cell line outcome for beta currently:

image

Note that the CLO is quite small because it's the original BAO-curated selection. BRENDA has 3279 items.

The new one invokes the CLO directly (or at least the categories we're interested in):

image

Note that the # of entries for the BRENDA section is now 2807, this being because the duplicates were migrated over into the CLO hierarchy (see data/ontology/cellmerge.ttl). In these instances, the CLO has borrowed the description from BRENDA (e.g. see HT-1080).