Open DavidNemeskey opened 2 years ago
JusText seems to remove too much and its accuracy seems not to be very high. We need a better tool for boilerplate removal.
Options:
The first two options (and the last one as well, probably) use DL, and need training data. For now, let's experiment with Trafilatura.
JusText seems to remove too much and its accuracy seems not to be very high. We need a better tool for boilerplate removal.
Options:
The first two options (and the last one as well, probably) use DL, and need training data. For now, let's experiment with Trafilatura.