filiph / linkcheck

Fast link checker
https://pub.dartlang.org/packages/linkcheck
MIT License
403 stars 51 forks source link

Skip patterns ending with # do not seem to work. #4

Closed chalin closed 7 years ago

chalin commented 7 years ago

I've been testing the skip pattern

/angular/guide/server-communication#

over site-webdev.

Here is part of the debug output:

Crawl will start on the following URLs: [http://localhost:4001/]
Crawl will check pages only on URLs satisfying: {http://localhost:4001/**}
Crawl will skip links that match patterns: UrlSkipper</angular/api/.*apiFilter, data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg', /angular/api/, /angular/guide/router(\.html)?($|#), /angular/guide/change-log.html$, /angular/cookbook/, /angular/guide/appmodule.html$, /angular/guide/server-communication#, /angular/api/static-assets/fonts, /angular/api/(docs|examples)/, /angular/api/.*/index/>
Crawl will check the following servers (and their robots.txt) first: {localhost:4001}
...

http://localhost:4001/angular/guide/server-communication
- (533:18) 'RxJS Obs..' => http://localhost:4001/angular/guide/server-communication#rxjs (HTTP 200 but missing anchor)
- (535:18) 'Enabling..' => http://localhost:4001/angular/guide/server-communication#enable-rxjs-operators (HTTP 200 but missing anchor)
...

Stats:
   14465 links
     331 destination URLs
     347 URLs ignored
      12 warnings
       0 errors

It should be skipping .../server-communication#rxjs.

filiph commented 7 years ago

Ok, note to self. We're currently decoupling fragments (#anchor) from new destination URLs so that linkcheck tries to access physical URLs only once. (Otherwise, it would assume the different URLs of /path#anchor1 and /path#anchor2 both need checking.)

But skipping according to fragment should work. So we need to move the skipping logic a bit higher up, before we decouple the fragment from the URL. We still need to make sure we create the Destination.

filiph commented 7 years ago

Fixed in 1.0.1 by https://github.com/filiph/linkcheck/commit/d213f2c4d1b51bf7595dc25701aa09757f693926. Tried this on site-webdev and it seems to work well now. Please do report any irregularities.