Closed exortech closed 1 year ago
Latest commit: 5aee276f17d1c3bfbb9dd02ad9e63e6fd80abde4
The changes in this PR will be included in the next version bump.
Not sure what this means? Click here to learn what changesets are.
Click here if you're a maintainer who wants to add another changeset to this PR
Hey @exortech, I am planning to release a new major version pretty soon (the next few days), on the next
branch. Any chance you could reimplement this but targeting the next
branch?
Thanks for taking the time to send a PR by the way!
Sure. No problem.
I have another change that I'd like to propose, which is to expose crawler.parseScriptTags as a configuration parameter. The built-in parser for simplecrawler is pretty basic and generally does a poor job of trying to pull uris out of script tags. This creates a lot of false positives, especially if I'm trying to also use simplecrawler to detect broken links. Changing the code to stop parsing script tags would be simplest, but would break backwards compatibility. So the intention is to make disabling script parsing configurable.
What do you think? Does that align with functionality that you would want to support for lighthouse-parade? Cheers, Owen.
On Mon, Dec 19, 2022 at 3:06 PM Caleb Eby @.***> wrote:
Hey @exortech https://github.com/exortech, I am planning to release a new major version pretty soon (the next few days), on the next branch. Any chance you could reimplement this but targeting the next branch?
— Reply to this email directly, view it on GitHub https://github.com/cloudfour/lighthouse-parade/pull/127#issuecomment-1358534711, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAERGU4MQ7AX2H7T7CT3RLWODS5VANCNFSM6AAAAAATD2KDQY . You are receiving this because you were mentioned.Message ID: @.***>
-- Owen Rogers | Exortech Consulting @exortech https://twitter.com/exortech | http://exortech.com/
To be honest, I am not a big fan of the simplecrawler library (and the library is now deprecated as well). I would definitely be open to using a different library that may avoid some of simplecrawler's issues, and I'd also be open to adding parameters to configure the behavior of that new crawler. But I think I will release the next version before I make that change, in a future major version.
Makes sense. I noticed that you have replacing simplecrawler on your task list for the next version in #117. So I guess it makes sense to hold off on this change until an alternate crawler is in place.
Closing this PR to submit a new PR for the next branch.
Resolves #126