dragnet-org / dragnet

Just the facts -- web page content extraction
MIT License
1.26k stars 180 forks source link

Support Python 3.9 and 3.10 #111

Open matt-peters opened 2 years ago

matt-peters commented 2 years ago

Adds support for newer versions of Python and sklearn > 1.0.0.

Fixes #110 #101 #109 #107

rashadmoarref commented 1 year ago

It'd be great if this PR gets some attention. I have verified that this branch builds on Python3.9 using below pre-requirements:

lxml==4.9.2
numpy==1.20.3
Cython==0.29.33

and that the default extract_content method works properly using below dependency versions:

joblib==1.2.0
scikit-learn==1.1.2
ericluugg commented 4 months ago

For anyone interested I've forked a 3.10 branch https://github.com/ericluugg/dragnet3.10

This is the highest I've been able to get the dependencies. Anything past scikit-learn 1.2.2 leads to errors unpickling old models

lxml==5.1.1 numpy==1.26.3 Cython==3.0.10 scipy==1.14.0 scikit-learn==1.2.2