filiph / linkcheck

Fast link checker
https://pub.dartlang.org/packages/linkcheck
MIT License
403 stars 51 forks source link

Skip patterns are ignored for external links #27

Closed chalin closed 5 years ago

chalin commented 6 years ago

For example, the following skip pattern:

forum/flutter-dev

seems to be ignored for the external link https://groups.google.com/forum/#!forum/flutter-dev:

http://localhost:4002/tos
- (807:12) 'flutter-..' => https://groups.google.com/forum/#!forum/flutter-dev (HTTP 200 but missing anchor)

cc @Sfshaza

stas00 commented 5 years ago

I have the same issue, how do we deal with links like these? linkchecker tries to find #!forum/flutter-dev anchor.

I tried encoding the url:

https://groups.google.com/forum/%23%21forum/fastai-diff

but google doesn't decode the # part, only the !-part.

hash-bang has been deprecated for a while now: https://www.w3.org/blog/2011/05/hash-uris/ but google groups doesn't get updated :(

I used the domain instead to fix it. My current skip-urls file:

# #! url is deprecated and causes a false negative report
https://groups.google.com/forum/
# github blocks robots
https://github.com/
filiph commented 5 years ago

Sorry for the late reply to this.

This seems to work for me. I've added a test case (https://github.com/filiph/linkcheck/commit/0b7a581717e12ab182fe9569e047df4303a84f06) and it passes (https://travis-ci.org/filiph/linkcheck/builds/450432074).

One thing to note is that the lines in the skip file are all regexp. That means that if you want to skip a #! url, you'll have to escape it properly. (I'm no regexp expert, but I think you'll have to do something like http://example\.com/#\!something.)

If this resolves your issue, I'll treat this as a documentation bug.

stas00 commented 5 years ago

Thanks for getting back on this, @filiph, I ended up using w3c checklink since your tool is currently unreliable: https://github.com/filiph/linkcheck/issues/29

chalin commented 5 years ago

Thanks for looking into this @filiph. I'll give it another try soon. (As for !, it shouldn't require escaping in such a case.)

chalin commented 5 years ago

Strange, the same skip pattern indeed seems to be working now!