dragnet-org / dragnet

Just the facts -- web page content extraction
MIT License
1.26k stars 180 forks source link

Is this project maintained? #110

Open swetepete opened 2 years ago

swetepete commented 2 years ago

I can't run dragnet if I simply pip install it. I am trying to compile it manually but pip install -r requirementsfails. I am considering using the docker file but I'm not sure how that will interact with the other tools I want to use dragnet inside of.

It seems like a pretty big project figuring out how to properly install this - I'm willing to take it on but I'm curious, is this project still being maintained? Why doesn't the pip installation work, is it not up to date or something?

Thanks very much

robh71 commented 2 years ago

I have this running by itself as a web service because I couldn't get it working above python3.7

Breaking this out of my code base as a service has removed any issues dealing with this package whatsoever. While it was in my application code it was a constant source of irritation.

matt-peters commented 2 years ago

None of the maintainers have had bandwidth to maintain it in recent years. If anyone would like to take over maintenance we welcome new contributors! I would like to see it run reliably on newer versions of python, and could help with updating the code to run in new versions and release a new version to pip. What errors are you seeing that prevents running with new python? Is it these older issues? #101 #107

swetepete commented 2 years ago

@robh71 Awesome, thanks. So keeping it in version 3.7 works fine though?

@matt-peters I’m definitely open to trying to help. You could let me know what you think needs to be done next. If I remember correctly the specific error from the pip installation was that there was something wrong with gcc, but it seemed misleading like that wasn’t the real problem. I can double check and investigate deeper for sure.

theblackcat102 commented 2 years ago

I currently maintain a fork of dragnet called extractnet which supports python 3.7 till 3.9

It still based on the same features just using more training data and replace the extraction model with neural network to do multi task extraction ( author, description, content ).

matt-peters commented 2 years ago

FYI, there is a PR in #111 that adds support for newer versions of python and sklearn (it's been tested for 3.9 and 3.10).