clulab / reach

Reach Biomedical Information Extraction
Other
97 stars 39 forks source link

Integrate fix to #722 into master #726

Closed enoriega closed 3 years ago

enoriega commented 3 years ago

Before moving approving, what should we do about the bioresources version number?

kwalcock commented 3 years ago

Let me know if you need to have bioresources republished with an updated version number.

enoriega commented 3 years ago

@kwalcock, Please do update the version number to 1.1.37 and republish it. Thanks!

kwalcock commented 3 years ago

Do the kb/ner files need to be regenerated because of this PR?

enoriega commented 3 years ago

No, I already regenerated them and pushed them along the latest commit this morning

kwalcock commented 3 years ago

With the current bioresources master branch and testing with either master or issue722 from reach, there seem to be errors:

[info] TestNERLabeling:
[info] apoptosis, autophagic cell death, quiescence, hematopoiesis, or complex assembly cause cancer.
[info] - should have BioProcess label (56 milliseconds)
[info] MPanc-96, mast, Hyssop, CEM/TART, and ZR75-1 cause cancer.
[info] - should have CellLine label (52 milliseconds)
[info] apud cell, AV nodal myocyte, An1 B Cell, xanthoblast, and zygote cause cancer
[info] - should have CellType label *** FAILED *** (99 milliseconds)
[info]   Vector(org.clulab.reach.mentions.CorefTextBoundMention@23b5ee8e, org.clulab.reach.mentions.CorefTextBoundMention@6fedde8e, org.clulab.reach.mentions.CorefTextBoundMention@c710390, org.clulab.reach.mentions.CorefTextBoundMention@ffd18082) had size 4 instead of expected size 5 (TestNERLabeling.scala:56)
[info] apud cells, AV nodal myocytes, An1 B Cells, xanthoblasts, and zygotes cause cancer
[info] - should have CellType label *** FAILED *** (62 milliseconds)
[info]   Vector(org.clulab.reach.mentions.CorefTextBoundMention@87cd63e6, org.clulab.reach.mentions.CorefTextBoundMention@567186bc, org.clulab.reach.mentions.CorefTextBoundMention@fad155e, org.clulab.reach.mentions.CorefTextBoundMention@36751ec) had size 4 instead of expected size 5 (TestNERLabeling.scala:64)

[info] TestReachContextKBLister:
[info] Context KBs list
[info] - should have at least 1000 entries (0 milliseconds)
[info] Context KBs list
[info] - should have cell line entries (28 milliseconds)
[info] Context KBs list
[info] - should have cell type entries *** FAILED *** (12 milliseconds)
[info]   26 was not equal to 22 (TestReachContextKBLister.scala:38)

[info] TestNERStopList:
[info] org.clulab.reach.TestNERStopList *** ABORTED *** (0 milliseconds)
[info]   Duplicate test name: fig should not have a mention (TestNERStopList.scala:18)

[info] *** 1 SUITE ABORTED ***
[info] *** 3 TESTS FAILED ***
[error] Failed tests:
[error]     org.clulab.reach.TestNERLabeling
[error]     org.clulab.reach.TestReachContextKBLister
[error] Error during tests:
[error]     org.clulab.reach.TestNERStopList
[error] (main / Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 349 s (05:49), completed Jan 18, 2021 11:44:10 AM
The 'main/test' suite failed!
bgyori commented 3 years ago

The errors might be related to the resource file updates in https://github.com/clulab/bioresources/pull/54, I can look into it this afternoon.

enoriega commented 3 years ago

At least one error was introduced accidentally by me. I fixed bioresources and the TestNERStopList suite shouldn't crash anymore

bgyori commented 3 years ago

I examined TestReachContextKBLister which is a set of tests looking at the content of resource files directly. The counts should be updated as follows to be consistent with https://github.com/clulab/bioresources/pull/54:

    (ctypes.count(cg => hasText(cg, "granulocyte"))) should be (26)
    (ctypes.count(cg => hasId(cg, "CL:0000557"))) should be (14)
    (ctypes.count(cg => hasText(cg, "hair"))) should be (54)

Hope I didn't miscount! I'll look at TestNERLabeling next.

bgyori commented 3 years ago

I looked into

[info] apud cell, AV nodal myocyte, An1 B Cell, xanthoblast, and zygote cause cancer
[info] - should have CellType label *** FAILED *** (99 milliseconds)
[info]   Vector(org.clulab.reach.mentions.CorefTextBoundMention@23b5ee8e, org.clulab.reach.mentions.CorefTextBoundMention@6fedde8e, org.clulab.reach.mentions.CorefTextBoundMention@c710390, org.clulab.reach.mentions.CorefTextBoundMention@ffd18082) had size 4 instead of expected size 5 (TestNERLabeling.scala:56)
[info] apud cells, AV nodal myocytes, An1 B Cells, xanthoblasts, and zygotes cause cancer
[info] - should have CellType label *** FAILED *** (62 milliseconds)
[info]   Vector(org.clulab.reach.mentions.CorefTextBoundMention@87cd63e6, org.clulab.reach.mentions.CorefTextBoundMention@567186bc, org.clulab.reach.mentions.CorefTextBoundMention@fad155e, org.clulab.reach.mentions.CorefTextBoundMention@36751ec) had size 4 instead of expected size 5 (TestNERLabeling.scala:64)

These happen because zygote was removed from the Cell Ontology, so I updated to test to use zygospore instead. I pushed both changes to the PR branch.

kwalcock commented 3 years ago

Thanks! I'll be looking carefully tomorrow.

MihaiSurdeanu commented 3 years ago

Thank you @bgyori !

enoriega commented 3 years ago

After @bgyori's changes, all tests pass in this branch.

kwalcock commented 3 years ago

Thanks, @enoriega. I'll verify and then it sounds like it will help to publish bioresources so that a localPublish of that is not necessary for reach. Then the dependency in reach can be updated and merged with master. Is that right?

enoriega commented 3 years ago

That's correct, @kwalcock. Thanks!

enoriega commented 3 years ago

Thanks @kwalcock