-
https://stackoverflow.com/questions/1962389/what-is-the-state-of-the-art-in-html-content-extraction
Links:
https://pypi.org/project/boilerpy3/
https://github.com/kohlschutter/boilerpipe
https://…
-
Hello,
I am having trouble using dragnet with python3.9. In particular, I get an error like this when importing dragnet:
```
root@2e4bbb389174:/home# python3
Python 3.9.2 (default, Feb 19 2021,…
-
>If the central server is able to learn the identity of the device reporting an accessory or the identity of the owner requesting the location of an accessory, then it can infer information about that…
-
We're going to need to do some amount of research into potential techniques, public datasets, etc. in order to determine what direction we should head here.
Relevant Publications:
- [A Machine Lear…
-
https://pleiades.stoa.org/home
Seems to be due to mixed content blocking. The page is served over HTTPS, but leaflet.css seems to be referenced via an http url. In chrome, if one chooses to "load u…
-
I can't run dragnet if I simply pip install it. I am trying to compile it manually but pip install -r requirementsfails. I am considering using the docker file but I'm not sure how that will interact …
-
Hi, I'm currently compiling additional, more modern html documents with gold standard content + comments for use in training dragnet models, and I have a few questions:
- Should I consider the text…
-
I'm occasionally getting `BlockifyError` s caused by malformed encoding values set [here](https://github.com/dragnet-org/dragnet/blob/master/dragnet/blocks.pyx#L843). Here's the tail of the traceback:…
-
-
Hi Matt, Dan,
thanks for this wonderful library.
While training some augmented models, I noticed that there are some steps in the process which could benefit a lot from parallelization.
There a…