This project is a simple wrapper around the very excellent and robust Tika text extraction Java library. This project produces two nugets:
The best way to get started is to:
TextExtractor
object and call one of the Extract
methods.// using TikaOnDotNet.TextExtraction;
var textExtractor = new TextExtractor();
var wordDocContents = textExtractor.Extract(@".\path\to\my favorite word.docx");
var webPageContents = textExtractor.Extract(new Uri("https://google.com"));
Take a look at our tests for more usage examples.
Have an idea to make this project better? Great! Start out by taking a look at our Contributing Guide.
Search in the Issues as your problem may be a common one. If don't find your problem please create an issue. Contributors here will chime in when they can.