Open dg7692 opened 2 years ago
Thanks for the feedback Daniel! :)
Yeah, I wanted to add some latex cleaning inside this library and handle different file types (.tex
, .docx
, ...), but at the moment, the best approach is to use existing tools that extract only the text from this type of files.
For example for latex files, I found out that detex
does a pretty good job (comes with the standard installation of latex).
So what I tried is:
detex main.tex > main_detexed.txt
so that the latex commands get removed, and the plain text is saved to the output fileformal-writing-checker "$(cat main_detexed.txt)"
to run this library on the text only.It's one step more, but this does the trick in many cases.
With the new version 0.0.4 you can also run this directly in a pipe without the need for an intermediate file:
detex main.tex | formal-writing-checker
Limitations:
if
conditions in latex, they will be just ignored, and everything will be passed down the pipelinecomment
, its content will be still checked and not ignoredMartino
The tool is nice; an additional adjustment which can strip out latex commands - such as those starting a document - or at least disregard them from the sentence length, would be really helpful.