Constannnnnt / Distributed-CoreNLP

This infrastructure, built on Stanford CoreNLP, MapReduce and Spark with Java, aims at processing documents annotations at large scale.
https://github.com/Constannnnnt/Distributed-CoreNLP
MIT License
0 stars 0 forks source link

sentiment update #26

Closed Constannnnnt closed 5 years ago

Constannnnnt commented 5 years ago

update sentiment: use the longest sentence in a doc, given the assumption that there is only one sentence each line and it needs to deal with some corner cases, such as "Mr. " or ("et al.").

output before:

(1,Albert Einstein 14 March 1879 ~ 18 April 1955) was a German-born theoretical physicist[5] who developed the theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics).) (2,[3][6]:274)
(2,His work is also known for its influence on the philosophy of science.) (3,[7][8] He is best known to the general public for his mass-energy equivalence formula E = mc2, which has been dubbed "the world's most famous equation".) (2,[9])
(1,He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect",[10] a pivotal step in the development of quantum theory.)

output after:

(1)
(3)
(1)