Closed LucasGorgal closed 3 years ago
I am using a MacOS High Sierra 10.13.6
Sorry I didn't see this earlier. Could you please share the HTML file this failed on? If you're not sure, please share the result of linkcheck --debug <url>
.
I'm having a similar problem, I think. Here's a page I've put on my server under the name sample.php
:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Example Title</title>
<style type="text/css">
</style>
</head>
<body>
</body>
</html>
(Disclaimer: this is a file from an old site. I promise I don't use PHP for anything modern.)
Here's a log from my terminal illustrating the failure:
$ linkcheck --debug http://localhost/sample.php
Reading URLs:
http://localhost/sample.php
Crawl will start on the following URLs: [http://localhost/sample.php]
Crawl will check pages only on URLs satisfying: {http://localhost/sample.php**}
Crawl will skip links that match patterns: UrlSkipper<>
Crawl will check the following servers (and their robots.txt) first: {localhost}
Using 4 threads.
Checking robots.txt and availability of server: localhost
Added: http://localhost/sample.php to Worker<1> with 0ms delay
Server check of localhost complete.
Server check for localhost complete: connected, no robots.txt.
Unhandled exception:
NoSuchMethodError: The getter 'primaryType' was called on null.
Receiver: null
Tried calling: primaryType
#0 Object.noSuchMethod (dart:core-patch/object_patch.dart:50:5)
#1 DestinationResult.updateFromResponse (package:linkcheck/src/destination.dart:326:48)
#2 checkPage (package:linkcheck/src/worker/worker.dart:127:11)
<asynchronous suspension>
#3 worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:192:29)
<asynchronous suspension>
#4 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#5 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#6 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#7 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#8 _StreamController._add (dart:async/stream_controller.dart:640:7)
#9 _StreamController.add (dart:async/stream_controller.dart:586:5)
#10 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#11 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#12 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#13 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#14 _StreamController._add (dart:async/stream_controller.dart:640:7)
#15 _StreamController.add (dart:async/stream_controller.dart:586:5)
#16 _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#17 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#18 CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#19 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#20 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#21 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#22 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#23 _StreamController._add (dart:async/stream_controller.dart:640:7)
#24 _StreamController.add (dart:async/stream_controller.dart:586:5)
#25 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:172:12)
Killing unresponsive Worker<1>
Done checking: http://localhost/sample.php (connection failed) => 0 links
- BROKEN
All jobs are done or user pressed Ctrl-C
Deduping destinations
Closing the isolate pool
Broken links
Done crawling.
Provided URLs failing:
http://localhost/sample.php (connection failed)
Error. Couldn't connect or find any links. Have you started the server?
Is it possible that this is specifically a PHP-related thing? Despite the PHP URL, the result of an HTTP request should be a valid HTML document, so I'm not sure why it would fail.
Using linkcheck 2.0.9 with Dart 2.4.1 on Debian 10. Thanks for the excellent tool!
Hi, thanks for the detailed report!
It looks like the local server you're using isn't reporting the Content-Type (mime type). I fixed the bug that crashes linkcheck
in such instances but beyond that, there's not much I can do, unfortunately. I tentatively decided that in such cases linkcheck
will try to parse the resource as if it was HTML, and assign a warning. That means you'll still get your site crawled, but you'll get a bazillion warnings on every page.
I said "looks like" above because I'm not 100% sure. It's possible there's some other reason why linkcheck
doesn't see any content type. If so, please feel free to reopen this issue.
The fix will land shortly as 2.0.10
. Run pub global activate linkcheck
to upgrade.
Thanks for pushing this update! I've just checked the server I'm using. When accessing a page which doesn't crash linkcheck
, I get response headers like
HTTP/1.1 301 Moved Permanently
Date: Mon, 02 Sep 2019 11:42:57 GMT
Server: Apache/2.4.38 (Debian)
Location: http://localhost/~zpalmer/cs21/
Content-Length: 314
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
HTTP/1.1 200 OK
Date: Mon, 02 Sep 2019 11:42:57 GMT
Server: Apache/2.4.38 (Debian)
Vary: Accept-Encoding
Content-Length: 949
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/html;charset=UTF-8
The page that does crash linkcheck
produces these response headers:
HTTP/1.1 200 OK
Date: Mon, 02 Sep 2019 11:43:26 GMT
Server: Apache/2.4.38 (Debian)
Last-Modified: Thu, 29 Aug 2019 20:44:47 GMT
ETag: "1f83-5914793818dc0"
Accept-Ranges: bytes
Content-Length: 8067
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
With 2.0.10, I'm now getting a different exception:
Unhandled exception:
NoSuchMethodError: The getter 'charset' was called on null.
Receiver: null
Tried calling: charset
#0 Object.noSuchMethod (dart:core-patch/object_patch.dart:50:5)
#1 checkPage (package:linkcheck/src/worker/worker.dart:148:29)
<asynchronous suspension>
#2 worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:192:29)
<asynchronous suspension>
#3 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#4 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#5 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#6 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#7 _StreamController._add (dart:async/stream_controller.dart:640:7)
#8 _StreamController.add (dart:async/stream_controller.dart:586:5)
#9 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#10 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#11 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#12 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#13 _StreamController._add (dart:async/stream_controller.dart:640:7)
#14 _StreamController.add (dart:async/stream_controller.dart:586:5)
#15 _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#16 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#17 CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#18 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#19 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#20 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#21 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#22 _StreamController._add (dart:async/stream_controller.dart:640:7)
#23 _StreamController.add (dart:async/stream_controller.dart:586:5)
#24 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:172:12)
It appears that this occurs when the charset is missing from the Content-Type
header. I'm guessing that this is a consequence of the default being applied when my page lacks a Content-Type
entirely, but it also reveals a more general issue if a server produces a Content-Type
with no associated charset
value.
Thanks for the help on this!
As a note, I was not a maintainer closing this issue, so I'm not permitted to re-open it. I just learned that about GitHub. :)
This is excellent info, @zepalmer! I'll look into this. No promises on speed, though. :/
No problem! Thanks again for the excellent tool. This is part of my workflow for updating my course website and speeds things up a lot. If it takes a while, that's fine; if it bugs me, I'll go learn Dart and send you a PR. :)
Ooof, this took way longer than I anticipated, but it's finally fixed in version 2.0.15
. If things don't work as expected, please run linkcheck with --verbose
and paste the output here. Thanks for the patience!