collinbarrett / FilterLists

:shield: The independent, comprehensive directory of filter and host lists for advertisements, trackers, malware, and annoyances.
https://filterlists.com
MIT License
1.28k stars 116 forks source link

BOT: url validation errors #836

Closed collinbarrett closed 4 years ago

collinbarrett commented 5 years ago

This issue is auto-generated by the FilterLists.Agent. It is updated at about 9am UTC daily.

We rely on the help of the community to ensure that the FilterLists site data remains up-to-date. The URLs listed below have been automatically flagged and may need to be updated. Please consider submitting a PR against this issue updating some or all of the URLs accordingly.

Thanks for your contributions!

License.json

FilterList.json

Maintainer.json

Software.json

DandelionSprout commented 5 years ago

Just a quick note before I begin planning how to fix the rest of the links:

Those links that the agent(?) claims should be changed into something involving /en/ or /en-US/ are false alarms, because most or all such cases are in fact language-adhering redirects that are more preferable to most users than what the agent's suggestions are.

For instance, as someone who uses Norwegian as my main browser language, https://adguard.com/ leads to https://adguard.com/no/welcome.html.

DandelionSprout commented 5 years ago

Additionally, all of the agent's suggestions for GitCDN links are also false alarms, because its suggestions are in fact build-specific, whereas the existing links correctly redirect to whatever the newest build at the time is.

collinbarrett commented 5 years ago

Awesome, thanks. Yeah, the "Agent" validating URLs is very new, so I'm sure there will be some bugs.

DandelionSprout commented 5 years ago

Another bug in the agent that I've come across, is that it doesn't seem to think of links with # as being valid, despite how many websites correctly use it to auto-jump to a specific page section.

DandelionSprout commented 5 years ago

Would it also be possible to program the agent so that it tries to see if a link is available in HTTPS or not?

collinbarrett commented 5 years ago

It's supposed to do that^, but I'm not sure that feature is working. See here. I'll look into it.

DandelionSprout commented 5 years ago

In the event that it does do that after all, then I can only presume that it forgets to check that after a redirection, most notably with links to http://cosmonote.blogspot.jp/ and https://energized.pro/support/.

collinbarrett commented 5 years ago

@DandelionSprout I just ran the validation again after #841 , and the list doesn't look much shorter. After some investigation, this seems to be the primary problem: #843

...I'll work on it.

collinbarrett commented 5 years ago

daily updates of this Issue are suspended until #843 is resolved.

collinbarrett commented 5 years ago

re-enabled and new run triggered that removed a lot of the false errors on GitHub links in the original post above due to 429 TOO MANY REQUESTS.

collinbarrett commented 4 years ago

Closing for now until #940 opens a new Issue.