Closed bgyori closed 4 years ago
This list will have to be adjusted in processors: https://github.com/clulab/processors/blob/master/corenlp/src/main/resources/reference.conf#L67
Can you please run the reach unit tests with this branch?
First, I made modifications in processors to regenerate the NER files with ner_kb.sh. I then made a bunch of updates in Reach to remove the old BE files and use the new FamPlex file with the new FamilyOrComplex label. I got to a point where tests are running without compliation errors. But unsurprisingly, due to the relabeling of a lot fo entities that were perviously Family or Complex, many tests are failing (179 of them). It looks like certain rules or actions also make assumptions about the type of entity, and unless FamilyOrComplex is propagated into those places and handled, it will fail. On example is
We analyze the Mek-Ras-Akt1 complex
which is not extracted since Mek and Ras now resolve to FamilyOrComplex.
I am wondering if it's worth going through and handling FamilyOrComplex everywhere necessary in Reach. Or to alternatively, stick all of FamPlex under an existing label like Gene_or_gene_product (since I don't think the distinction between Gene_or_gene_product and Complex/Family matters mechanistically).
If there is no mechanistic difference, I would be in favor of folding all of protein/family/complex under GGP.
If we're all Ok with this plan, I propose:
Okay, that sounds good, I will do that as soon as I can today.
I tried a couple of options and settled on one in which:
This results in a relatively small number of test failures in Reach that are hopefully not too much work to sort out:
[info] *** 27 TESTS FAILED ***
[error] Failed tests:
[error] org.clulab.reach.TestGrounding
[error] org.clulab.reach.TestComplexResolutions
[error] org.clulab.reach.TestFamilyResolutions
[error] org.clulab.reach.TestOverrides
@MihaiSurdeanu what do you think of this as an interim solution? For a subsequent release, we could try to do a more comprehensive refactoring of protein-like entities but I think this should be good for now.
Sounds good. I'll work this branch then.
I am working on this. But I am coupling this fix with a software restructuring of processors and reach. It looks like it will take more than 1 day. Hopefully not much more...
I merged this is master, and released 1.1.30.
This PR addresses #22 as follows:
Question/issue for @MihaiSurdeanu: the "FamilyOrComplex" label is new. I found that the KBLoader doesn't recognize it, see its output below:
and doesn't produce a ner/FamilyOrComplex.tsv.gz as expected. What needs to be changed for it to do that?