emorynlp / nlp4j

NLP framework for JVM languages.
http://emorynlp.github.io/nlp4j/
Other
149 stars 33 forks source link

Inequality symbols resulting in sentence boundaries (maybe) #33

Open jctetter opened 6 years ago

jctetter commented 6 years ago

nlp4j-english & nlp4j-api 1.1.3

, <, and >= symbols look like they are being used as sentence boundaries. = doesn't have this problem. please show revenue > five -> 2 root please show revenue < five -> 2 root please show revenue >= five -> 1 root please show revenue <= five -> 2 root, 1 root having a parent

Enter a test sentence:  please show revenue > five                                                       
  id      word_form          lemma   pos_tag       feat_map     dependency    semantic_heads   nament_tag
   0          @#r$%          @#r$%     @#r$%              _            _:_                 _        @#r$%
   1         please         please        UH              _    2:discourse                 _            O
   2           show           show        VB       pos2=VBP         0:root                 _            O
   3        revenue        revenue        NN              _         2:dobj                 _            O
   4              >              >         :       pos2=SYM        5:punct                 _            O
   5           five          #crd#        CD              _         0:root                 _   U-CARDINAL

Enter a test sentence:  please show revenue < five                                                       
  id      word_form          lemma   pos_tag       feat_map     dependency    semantic_heads   nament_tag
   0          @#r$%          @#r$%     @#r$%              _            _:_                 _        @#r$%
   1         please         please        UH              _    2:discourse                 _            O
   2           show           show        VB       pos2=VBP         0:root                 _            O
   3        revenue        revenue        NN              _     5:compound                 _            O
   4              <              <      HYPH         pos2=:        5:punct                 _            O
   5           five          #crd#        CD              _         0:root                 _   U-CARDINAL

Enter a test sentence:  please show revenue >= five                                                      
  id      word_form          lemma   pos_tag       feat_map     dependency    semantic_heads   nament_tag
   0          @#r$%          @#r$%     @#r$%              _            _:_                 _        @#r$%
   1         please         please        UH              _    2:discourse                 _            O
   2           show           show        VB       pos2=VBP         0:root                 _            O
   3        revenue        revenue        NN              _         2:dobj                 _            O
   4              >              >       SYM     pos2=-RRB-        6:punct                 _            O
   5              =              =       SYM        pos2=CC        6:punct                 _            O
   6           five          #crd#        CD              _         3:nmod                 _   U-CARDINAL

Enter a test sentence:  please show revenue <= five                                                      
  id      word_form          lemma   pos_tag       feat_map     dependency    semantic_heads   nament_tag
   0          @#r$%          @#r$%     @#r$%              _            _:_                 _        @#r$%
   1         please         please        UH              _    2:discourse                 _            O
   2           show           show        VB       pos2=VBP         0:root                 _            O
   3        revenue        revenue        NN              _         2:dobj                 _            O
   4              <              <       SYM        pos2=XX        6:punct                 _            O
   5              =              =       SYM        pos2=CC        6:punct                 _            O
   6           five          #crd#        CD              _         **2:root**                 _   U-CARDINAL