Open aleks-v-k opened 4 years ago
The PR is to add the possibility to minimize text extracted from HTML:
The main reason for these changes: the parser is OOM killed on some large html files (there are a lot of spaces + a table).
Would you like to accept such changes? If yes, then I will add tests to cover the new code.
Hello, I've recently been made a maintainer of this project. I'd be interested in these changes. I'd also be interested in a selectolax-based text extractor if you were feeling adventurous.
The PR is to add the possibility to minimize text extracted from HTML:
The main reason for these changes: the parser is OOM killed on some large html files (there are a lot of spaces + a table).
Would you like to accept such changes? If yes, then I will add tests to cover the new code.