deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.87k stars 593 forks source link

.tex Latex support #264

Closed impredicative closed 4 years ago

impredicative commented 5 years ago

Is it possible to get support for .tex Latex files?

jpweytjens commented 5 years ago

@impredicative What do you specifically have in mind with .tex support? In general, parsing .tex files and removing all LaTeX code is a non trivial task. There are tools such as DeTex and pandoc that allow converting .tex to .txt files which can be parsed with textract.

If this is sufficient, I encourage you to open a PR that adds support for either of these tools. If you have anything else in mind, please elaborate.

jpweytjens commented 4 years ago

Support for .tex files will be added in the upcoming version with pandoc and opendetex as parsers.