Tessmore / sbd

Sentence Boundary Detection in javascript for node. http://tessmore.github.io/sbd/
MIT License
207 stars 39 forks source link

Sentence Boundary Detection (SBD)

Split text into sentences with a vanilla rule based approach (i.e working ~95% of the time).

Demo

http://tessmore.github.io/sbd/

Installation

Use npm or yarn:

$ npm install sbd

$ yarn add sbd

How to

var tokenizer = require('sbd');

var optional_options = {};
var text = "On Jan. 20, former Sen. Barack Obama became the 44th President of the U.S. Millions attended the Inauguration.";
var sentences = tokenizer.sentences(text, optional_options);

// [
//  'On Jan. 20, former Sen. Barack Obama became the 44th President of the U.S.',
//  'Millions attended the Inauguration.',
// ]

Optional options

var options = {
    "newline_boundaries" : false,
    "html_boundaries"    : false,
    "sanitize"           : false,
    "allowed_tags"       : false,
    "preserve_whitespace" : false,
    "abbreviations"      : null
};

Contributing

You can run unit tests with npm test.

If you feel something is missing, you can open an issue stating the problem sentence and desired result. If code is unclear give me a @mention. Pull requests are welcome.

Building the (minified) scripts

npm install -g browserify

npm run-script build