Open GregB712 opened 3 years ago
Hi @GregB712, I just stambled upon this page while looking for issues related to the library. I'm the developer of trafilatura, may I ask on which kind of pages the main texts were extracted properly? I'm curious to find bugs and trickier examples...
We need a better way to collect the main part of each page (preferable with usage of Beautiful Soup package). We currently use the library trafilatura .