CogComp / cogcomp-nlp

CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
http://nlp.cogcomp.org/
Other
470 stars 144 forks source link

Sentence annotator don't work when Period don't follow by a space. #561

Open haowu40 opened 7 years ago

haowu40 commented 7 years ago
2013-08-05T22: 47: 53 China's Ministry of Public Security to crack down on the use of "pseudo-base station" to implement illegal and criminal activities arrested 217 suspects liuyizhan China's Ministry of Public Security to crack down on the use of "pseudo-base station" to carry out criminal activities arrested 217 suspects Xinhua News Agency Beijing, August 5 (Reporter Liu Yi Zhan) reporter learned from the Ministry of Public Security on the 5th, the Ministry of Public Security recently deployed command of Beijing, Liaoning, Hunan, Guangdong and other 12 provinces and municipalities public security organs to focus on combat "pseudo base station" action, Gang 72, arrested 217 suspects, cracked all kinds of criminal cases 429, destroyed "pseudo base station" equipment production dens 4, seized "pseudo base station" equipment 96 sets.This year, Hunan, Guangdong and other places public security organs have repeatedly received the masses reported that the phone number was fooled to send fraud messages, or in banks, airports and other regional mobile phone often no signal and received a large number of advertising marketing SMS.According to the preliminary investigation, this is a new type of illegal and criminal activities using the "pseudo base station" equipment, and has found that Shanghai, Shenzhen and other places to produce and sell "fake base station" equipment crime dens, equipment sold to Beijing, Liaoning and other places.It is understood that the "pseudo base station" equipment is a high-tech equipment, mainly by the host and notebook computers, to search for its center, a certain radius within the mobile phone card information, and any fraudulent use of other mobile phone numbers to the user Mobile phone to send fraud, advertising and other short message.Such equipment is running, the user's mobile phone signal is forced to connect to the device, can not connect to the public telecommunications network, seriously affecting the normal use of mobile phone users.According to the investigators, criminal suspects usually "pseudo base station" equipment placed in the car, driving slowly on the road, or the car parked in a specific area, engaged in SMS fraud, advertising and other criminal activities.SMS fraud in the form of two main: First, "wide thin collection", the suspects in the banks, shopping malls and other crowded places to a variety of remittance names to a certain radius within the scope of the mass phone to send fraud messages; Type ", the suspect screened out the" mantissa better "phone number, in the name of the name to send text messages, in their friends and family, colleagues and other acquaintances in the implementation of targeted fraud.The use of "pseudo base station" engaged in advertising marketing, mainly for their own business to find customers, or orders from other units, according to the amount of fees.Ministry of Public Security official said, the use of "pseudo-base station" equipment to commit a crime seriously endanger the national communications security, disrupt the social and public order, damage the legitimate rights and interests of the masses.Such equipment once the ulterior motives of the organization or personal use, fraudulent use of the name of the authority of the state fabricated, send false information, resulting in social impact is more difficult to measure.The focus of combat operations is the direct deployment of the Ministry of Public Security command, the implementation of the first fight against the focus, the public security organs will always maintain a crackdown on high pressure situation, and resolutely safeguard the vital interests of the people.(End) +

For example, the above document will only have one sentence.

This is causing many failures when annotating document sentence by sentence.

mssammon commented 7 years ago

this is really a tokenization/sentence splitter issue: sentence annotator relies on the boundaries that the tokenizer provides.