Closed dandv closed 7 months ago
Yeah, the idea of the project is great, but it lacks so many features to make it perfect.
absolutely all good if this isn't the right project for you - work is actively underway to keep improving the project for the custom GPTs use case and specific feedback and PRs to improve things is always highly appreciated
Thanks Steve. I understand this is open source, I know how it works. I've made several suggestions already.
I'm simply asking if it wouldn't be more productive to create an output plugin for an establish crawler, than to reinvent the crawling wheel with the only differentiating feature being rather trivial if I understand correctly (outputting the bare text extracted from an HTML element to a JSON file).
could you suggest some examples of well established crawlers you think integration with would be better?
This project is built on crawlee which is a pretty robust crawler, but certainly open to better alternatives
How exactly is this project different from an established crawler that would just dump the HTML text into the
.html
field of a JSON array?It's got 12k stars, but it lacks basic features like canonicalizing links (see #73) or preserving links (#74).