filiph / linkcheck

Fast link checker
https://pub.dartlang.org/packages/linkcheck
MIT License
397 stars 51 forks source link

Unhandled exception #40

Closed LucasGorgal closed 3 years ago

LucasGorgal commented 5 years ago
Unhandled exception:
NoSuchMethodError: The getter 'primaryType' was called on null.
Receiver: null
Tried calling: primaryType
#0      Object.noSuchMethod (dart:core-patch/object_patch.dart:50:5)
#1      DestinationResult.updateFromResponse (package:linkcheck/src/destination.dart:327:48)
#2      checkPage (package:linkcheck/src/worker/worker.dart:127:11)
<asynchronous suspension>
#3      worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:192:29)
<asynchronous suspension>
#4      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#5      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#6      _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#7      _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#8      _StreamController._add (dart:async/stream_controller.dart:640:7)
#9      _StreamController.add (dart:async/stream_controller.dart:586:5)
#10     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#11     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#12     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#13     _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#14     _StreamController._add (dart:async/stream_controller.dart:640:7)
#15     _StreamController.add (dart:async/stream_controller.dart:586:5)
#16     _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#17     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#18     CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#19     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#20     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#21     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#22     _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#23     _StreamController._add (dart:async/stream_controller.dart:640:7)
#24     _StreamController.add (dart:async/stream_controller.dart:586:5)
#25     _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:171:12)
LucasGorgal commented 5 years ago

I am using a MacOS High Sierra 10.13.6

filiph commented 5 years ago

Sorry I didn't see this earlier. Could you please share the HTML file this failed on? If you're not sure, please share the result of linkcheck --debug <url>.

zepalmer commented 4 years ago

I'm having a similar problem, I think. Here's a page I've put on my server under the name sample.php:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Example Title</title>
<style type="text/css">
</style>
</head>

<body>
</body>
</html>

(Disclaimer: this is a file from an old site. I promise I don't use PHP for anything modern.)

Here's a log from my terminal illustrating the failure:

$ linkcheck --debug http://localhost/sample.php
Reading URLs:
http://localhost/sample.php
Crawl will start on the following URLs: [http://localhost/sample.php]
Crawl will check pages only on URLs satisfying: {http://localhost/sample.php**}
Crawl will skip links that match patterns: UrlSkipper<>
Crawl will check the following servers (and their robots.txt) first: {localhost}
Using 4 threads.
Checking robots.txt and availability of server: localhost
Added: http://localhost/sample.php to Worker<1> with 0ms delay
Server check of localhost complete.
Server check for localhost complete: connected, no robots.txt.
Unhandled exception:
NoSuchMethodError: The getter 'primaryType' was called on null.
Receiver: null
Tried calling: primaryType
#0      Object.noSuchMethod (dart:core-patch/object_patch.dart:50:5)
#1      DestinationResult.updateFromResponse (package:linkcheck/src/destination.dart:326:48)
#2      checkPage (package:linkcheck/src/worker/worker.dart:127:11)
<asynchronous suspension>
#3      worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:192:29)
<asynchronous suspension>
#4      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#5      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#6      _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#7      _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#8      _StreamController._add (dart:async/stream_controller.dart:640:7)
#9      _StreamController.add (dart:async/stream_controller.dart:586:5)
#10     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#11     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#12     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#13     _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#14     _StreamController._add (dart:async/stream_controller.dart:640:7)
#15     _StreamController.add (dart:async/stream_controller.dart:586:5)
#16     _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#17     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#18     CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#19     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#20     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#21     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#22     _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#23     _StreamController._add (dart:async/stream_controller.dart:640:7)
#24     _StreamController.add (dart:async/stream_controller.dart:586:5)
#25     _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:172:12)
Killing unresponsive Worker<1>
Done checking: http://localhost/sample.php (connection failed) => 0 links
- BROKEN
All jobs are done or user pressed Ctrl-C
Deduping destinations
Closing the isolate pool
Broken links
Done crawling.                   

Provided URLs failing:
http://localhost/sample.php (connection failed)

Error. Couldn't connect or find any links. Have you started the server?

Is it possible that this is specifically a PHP-related thing? Despite the PHP URL, the result of an HTTP request should be a valid HTML document, so I'm not sure why it would fail.

Using linkcheck 2.0.9 with Dart 2.4.1 on Debian 10. Thanks for the excellent tool!

filiph commented 4 years ago

Hi, thanks for the detailed report!

It looks like the local server you're using isn't reporting the Content-Type (mime type). I fixed the bug that crashes linkcheck in such instances but beyond that, there's not much I can do, unfortunately. I tentatively decided that in such cases linkcheck will try to parse the resource as if it was HTML, and assign a warning. That means you'll still get your site crawled, but you'll get a bazillion warnings on every page.

I said "looks like" above because I'm not 100% sure. It's possible there's some other reason why linkcheck doesn't see any content type. If so, please feel free to reopen this issue.

The fix will land shortly as 2.0.10. Run pub global activate linkcheck to upgrade.

zepalmer commented 4 years ago

Thanks for pushing this update! I've just checked the server I'm using. When accessing a page which doesn't crash linkcheck, I get response headers like

  HTTP/1.1 301 Moved Permanently
  Date: Mon, 02 Sep 2019 11:42:57 GMT
  Server: Apache/2.4.38 (Debian)
  Location: http://localhost/~zpalmer/cs21/
  Content-Length: 314
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html; charset=iso-8859-1
  HTTP/1.1 200 OK
  Date: Mon, 02 Sep 2019 11:42:57 GMT
  Server: Apache/2.4.38 (Debian)
  Vary: Accept-Encoding
  Content-Length: 949
  Keep-Alive: timeout=5, max=99
  Connection: Keep-Alive
  Content-Type: text/html;charset=UTF-8

The page that does crash linkcheck produces these response headers:

  HTTP/1.1 200 OK
  Date: Mon, 02 Sep 2019 11:43:26 GMT
  Server: Apache/2.4.38 (Debian)
  Last-Modified: Thu, 29 Aug 2019 20:44:47 GMT
  ETag: "1f83-5914793818dc0"
  Accept-Ranges: bytes
  Content-Length: 8067
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive

With 2.0.10, I'm now getting a different exception:

Unhandled exception:
NoSuchMethodError: The getter 'charset' was called on null.
Receiver: null
Tried calling: charset
#0      Object.noSuchMethod (dart:core-patch/object_patch.dart:50:5)
#1      checkPage (package:linkcheck/src/worker/worker.dart:148:29)
<asynchronous suspension>
#2      worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:192:29)
<asynchronous suspension>
#3      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#4      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#5      _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#6      _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#7      _StreamController._add (dart:async/stream_controller.dart:640:7)
#8      _StreamController.add (dart:async/stream_controller.dart:586:5)
#9      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#10     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#11     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#12     _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#13     _StreamController._add (dart:async/stream_controller.dart:640:7)
#14     _StreamController.add (dart:async/stream_controller.dart:586:5)
#15     _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#16     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#17     CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#18     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#19     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#20     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#21     _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#22     _StreamController._add (dart:async/stream_controller.dart:640:7)
#23     _StreamController.add (dart:async/stream_controller.dart:586:5)
#24     _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:172:12)

It appears that this occurs when the charset is missing from the Content-Type header. I'm guessing that this is a consequence of the default being applied when my page lacks a Content-Type entirely, but it also reveals a more general issue if a server produces a Content-Type with no associated charset value.

Thanks for the help on this!

zepalmer commented 4 years ago

As a note, I was not a maintainer closing this issue, so I'm not permitted to re-open it. I just learned that about GitHub. :)

filiph commented 4 years ago

This is excellent info, @zepalmer! I'll look into this. No promises on speed, though. :/

zepalmer commented 4 years ago

No problem! Thanks again for the excellent tool. This is part of my workflow for updating my course website and speeds things up a lot. If it takes a while, that's fine; if it bugs me, I'll go learn Dart and send you a PR. :)

filiph commented 3 years ago

Ooof, this took way longer than I anticipated, but it's finally fixed in version 2.0.15. If things don't work as expected, please run linkcheck with --verbose and paste the output here. Thanks for the patience!