HazyResearch / fonduer

A knowledge base construction engine for richly formatted data
https://fonduer.readthedocs.io/
MIT License
409 stars 77 forks source link

Enable RegexMatchSpan with concatenates words by sep="(separator)" option #492

Closed YasushiMiyata closed 4 years ago

YasushiMiyata commented 4 years ago

Description of the problems or issues

Is your pull request related to a problem? Please describe. A clear and concise description of what the problem is.

A sentence "123 456 789" is parsed and gets three words "123", "456", and "789". I'd like to match a number like RegexMatchSpan(rgx=r"\d{9}", sep=" ")

but sep=" " has no effect

Does your pull request fix any issue. Fix #270

Description of the proposed changes

Enable RegexMatchSpan with sep="(separator)" option. It concatenates mention spans to one word and does RgexMatch without consideration of the separator.

Test plan

Add Test Code to 'fonduer/tests/candidates/test_matchers.py'. A sentence "This is apple" is parsed and gets 2 2-grams "This is" and "is apple". We can get "is apple" with following rgx and sep="(space)" option: RegexMatchSpan(rgx=r"isapple", sep=" ")

Checklist

YasushiMiyata commented 4 years ago

Some codes may be updated while creating #492. I'm now re-checking.

codecov-commenter commented 4 years ago

Codecov Report

Merging #492 into master will not change coverage. The diff coverage is 71.42%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #492   +/-   ##
=======================================
  Coverage   85.85%   85.85%           
=======================================
  Files          88       88           
  Lines        4568     4568           
  Branches      851      853    +2     
=======================================
  Hits         3922     3922           
  Misses        464      464           
  Partials      182      182           
Flag Coverage Δ
#unittests 85.85% <71.42%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...fonduer/candidates/models/implicit_span_mention.py 81.96% <66.66%> (ø)
src/fonduer/candidates/models/span_mention.py 82.24% <66.66%> (ø)
src/fonduer/candidates/matchers.py 97.31% <100.00%> (ø)
YasushiMiyata commented 4 years ago

Something failure in installation of ubuntu. There would be nothing more I can.

senwu commented 4 years ago

Thanks for making this clear!