dantleech / fink

PHP Link Checker
MIT License
204 stars 25 forks source link

PHP Fatal error: getRemainingStreams() must be of the type int #103

Open dwreski opened 4 years ago

dwreski commented 4 years ago

This is after running successfully for more than an hour on our site (replaced with "example" here; link disabled). This is using the version available today. What more can I do to troubleshoot this?

[200] httpx://twitter.com/share?url=httpx://example.com/advisories/archlinux/archlinux-201503-3-lib32-elfutils-directory-traversal&text=ArchLinux:%20201503-3:%20lib32-elfutils:%20directory%20traversal
----------------------------------------------------------------
Concurrency: 9, Queue size: 3757, Failures: 17797/64480 (27.60%)
Rate: 19.60 r/sec, 639.60 ms/r
Uptime: 01h 02m 03s
PHP Fatal error:  Uncaught TypeError: Return value of Amp\Http\Client\Connection\Internal\Http2ConnectionProcessor::getRemainingStreams() must be of the type int, float returned in /var/www/webstage.example.com-443/fink/vendor/amphp/http-client/src/Connection/Internal/Http2ConnectionProcessor.php:906
Stack trace:
#0 /var/www/webstage.example.com-443/fink/vendor/amphp/http-client/src/Connection/Http2Connection.php(55): Amp\Http\Client\Connection\Internal\Http2ConnectionProcessor->getRemainingStreams()
#1 /var/www/webstage.example.com-443/fink/vendor/amphp/amp/lib/functions.php(90): Amp\Http\Client\Connection\Http2Connection->Amp\Http\Client\Connection\{closure}()
#2 /var/www/webstage.example.com-443/fink/vendor/amphp/http-client/src/Connection/Http2Connection.php(66): Amp\call()
#3 /var/www/webstage.example.com-443/fink/vendor/amphp/http-client/src/Connection/ConnectionLimitingPool.php(299): Amp\Http\Client\Connection\Http2Connection->getStream()
#4 /var/www/webstage.example.com-443/fink/vend in /var/www/webstage.example.com-443/fink/vendor/amphp/http-client/src/Connection/Internal/Http2ConnectionProcessor.php on line 906 
dantleech commented 4 years ago

Possibly an Amphp issue (cc @kelunik ) (cannot see initially how that method could return a float)

dantleech commented 4 years ago

But otherwise can you provide the full stack trace? We could be more defensive.

Also, not sure if it can be related, but the prior URL looks very odd httpx ? Fink should not have crawled that URL.

dwreski commented 4 years ago

How do I provide a full stack trace?

The httpx was me substituting in place of https to prevent it from being crawled here or becoming an actual URL, sorry.

dantleech commented 4 years ago

I'd assumed there was more of the stack trace in the original trace you provided? Maybe indicating at which point Fink called the Amp HTTP client? But no worries, there should be only one place that does that :D

dantleech commented 4 years ago

Maybe you could try changing:

https://github.com/dantleech/fink/blob/cc30101110845c353a007a93efa678ffe51e77dc/lib/Model/Dispatcher.php#L86

to \Throwable instead? it should then catch this error and continue I think (but could be that the HTTP client is broken at this point)

dwreski commented 4 years ago

I don't understand "\Throwable" - replace the whole line with just that?

dantleech commented 4 years ago

No so that it becomes:

            try {
                yield from $this->crawler->crawl($url, $this->queue, $reportBuilder);
            } catch (\Throwable $exception) {
                $reportBuilder->withException($exception);
            }
dwreski commented 4 years ago

Okay, running.

I also notice the above link that caused the error isn't actually a 404, so not sure why it thinks that it is. Perhaps it just took too long to respond? Here's the real link.

https://twitter.com/share?url=https://linuxsecurity.com/advisories/archlinux/archlinux-201503-3-lib32-elfutils-directory-traversal&text=ArchLinux:%20201503-3:%20lib32-elfutils:%20directory%20traversal

dantleech commented 4 years ago

hmm, it says 200 though? That link is also fine if I run it through fink directly -- but if it were caused by a link it would be the next one which isn't shown.

dwreski commented 4 years ago

Ah yes, apologies. It seems virtually every link that's somehow associated with social media or a "sharer.php" redirect fails. Here are two that were marked red that I managed to capture.

https://www.facebook.com/sharer.php?u=https://linuxsecurity.com/news/hackscracks/new-attack-lets-android-apps-capture-loudspeaker-data-without-any-permission

https://www.linkedin.com/shareArticle?mini=true&url=https://linuxsecurity.com/news/hackscracks/lazarus-pivots-to-linux-attacks-through-dacls-trojan

Screenshot attached. Thanks for the great support.

fink-dead-links

dwreski commented 4 years ago

It failed again, perhaps due to the changes I made.

`PHP Fatal error: Uncaught TypeError: Argument 1 passed to DTL\Extension\Fink\Model\ReportBuilder::withException() must be an instance of Exception, instance of TypeError given, called in /var/www/webstage.linuxsecurity.com-443/fink/vendor/dantleech/fink/lib/Model/Dispatcher.php on line 87 and defined in /var/www/webstage.linuxsecurity.com-443/fink/vendor/dantleech/fink/lib/Model/ReportBuilder.php:73 Stack trace:

0 /var/www/webstage.linuxsecurity.com-443/fink/vendor/dantleech/fink/lib/Model/Dispatcher.php(87): DTL\Extension\Fink\Model\ReportBuilder->withException()

1 [internal function]: DTL\Extension\Fink\Model\Dispatcher->DTL\Extension\Fink\Model{closure}()

2 /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/amp/lib/Coroutine.php(115): Generator->throw()

3 /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/amp/lib/Failure.php(33): Amp\Coroutine->Amp{closure}()

4 /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/amp/lib/Internal/Placeholder.php(143): Amp\Failure->onResolve()

5 /var/www/webstage. in /var/www/webstage.linuxsecurity.com-443/fink/vendor/dantleech/fink/lib/Model/ReportBuilder.php on line 73

`

dwreski commented 4 years ago

84 try { 85 yield from $this->crawler->crawl($url, $this->queue, $reportBuilder); 86 } catch (\Throwable $exception) { 87 $reportBuilder->withException($exception); 88 }

kelunik commented 4 years ago

@dwreski Are you on a 32 bit platform?

dwreski commented 4 years ago

No.

uname -a Linux sage.inside.example.com 5.7.6-201.fc32.x86_64 #1 SMP Mon Jun 29 15:15:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

kelunik commented 4 years ago

@dwreski Could you check whether https://github.com/amphp/http-client/commit/d272ceba84c7c24345c33adfa72951269763de6a changes anything?

dwreski commented 4 years ago

I do not have the test directory on my installation, so I could not install that file. I did download and put in place the ./amphp/http-client/src/Connection/Internal/Http2ConnectionProcessor.php file, however.

I ran it once and lost the connection to the server after what appeared to be about 7 minutes of run time. The second time it died with what appears to be a different error after just a minute or so.

fink-dead-links1

`PHP Fatal error: Uncaught TypeError: Argument 1 passed to DTL\Extension\Fink\Model\ReportBuilder::withException() must be an instance of Exception, instance of TypeError given, called in /var/www/webstage.linuxsecurity.com-443/fink/vendor/dantleech/fink/lib/Model/Dispatcher.php on line 87 and defined in /var/www/webstage.linuxsecurity.com-443/fink/vendor/dantleech/fink/lib/Model/ReportBuilder.php:73 Stack trace:

0 /var/www/webstage.linuxsecurity.com-443/fink/vendor/dantleech/fink/lib/Model/Dispatcher.php(87): DTL\Extension\Fink\Model\ReportBuilder->withException()

1 [internal function]: DTL\Extension\Fink\Model\Dispatcher->DTL\Extension\Fink\Model{closure}()

2 /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/amp/lib/Coroutine.php(115): Generator->throw()

3 /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/amp/lib/Failure.php(33): Amp\Coroutine->Amp{closure}()

4 /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/amp/lib/Internal/Placeholder.php(143): Amp\Failure->onResolve()

5 /var/www/webstage. in /var/www/webstage.linuxsecurity.com-443/fink/vendor/dantleech/fink/lib/Model/ReportBuilder.php on line 73

`

dwreski commented 4 years ago

Perhaps you'd like to run it on the site from where you are?

https://linuxsecurity.com/

It's a very large site with tens of thousands of articles, which is why I was running it locally, but given it typically dies pretty quickly, maybe it would be helpful.

dantleech commented 4 years ago

@dwreski btw that's the error due to the change I suggested, just switch it back to Exception and it will work again.

dwreski commented 4 years ago

Okay, got it to run for almost an hour before choking again. fink-dead-links2

913 public function getRemainingStreams(): int 914 { 915 return $this->remainingStreams; 916 }

`[---] https://www.linkedin.com/shareArticle?mini=true&url=https://linuxsecurity.com/advisories/deblts/debian-lts-dla-1860-1-libxslt-security-update-12-16-47

Concurrency: 10, Queue size: 12138, Failures: 2236/47627 (4.69%) Rate: 81.20 r/sec, 119.70 ms/r Uptime: 00h 52m 58s PHP Fatal error: Uncaught TypeError: Return value of Amp\Http\Client\Connection\Internal\Http2ConnectionProcessor::getRemainingStreams() must be of the type int, float returned in /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/http-client/src/Connection/Internal/Http2ConnectionProcessor.php:915 Stack trace:

0 /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/http-client/src/Connection/Http2Connection.php(55): Amp\Http\Client\Connection\Internal\Http2ConnectionProcessor->getRemainingStreams()

1 /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/amp/lib/functions.php(90): Amp\Http\Client\Connection\Http2Connection->Amp\Http\Client\Connection{closure}()

2 /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/http-client/src/Connection/Http2Connection.php(66): Amp\call()

3 /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/http-client/src/Connection/ConnectionLimitingPool.php(299): Amp\Http\Client\Connection\Http2Connection->getStream()

4 /var/www/webstage.linuxsecurity.com-443/fink/vend in /var/www/webstage.linuxsecurity.com-443/fink/vendor/amphp/http-client/src/Connection/Internal/Http2ConnectionProcessor.php on line 915`

kelunik commented 4 years ago

Please try again with https://github.com/amphp/http-client/commit/4cc8b273fc3ab2ce2aa979a2ac0273d0e224c562. I could reproduce it and it's running fine now for 20 minutes, after which I aborted. Previously it always failed at about 10 minutes.

dwreski commented 4 years ago

It appears to be running more reliably now, but it's finding a lot of links it considers to be 404s which are not.

https://twitter.com/share?url=https://linuxsecurity.com/news/cryptography/software-developers-are-failing-to-implement-crypto-correctly-data-reveals&text=Software%20developers%20are%20failing%20to%20implement%20crypto%20correctly,%20data%20reveals

https://www.facebook.com/sharer.php?u=https://linuxsecurity.com/features/features/building-a-vpn-using-yavipin

Not all of the "share" links are 404s, but a great majority of the "share" links are.

Also, it would be good to be able to select a number of URLs to check at runtime, so I can get an idea of what the output looks like before having to wait for the whole site to complete.

Perhaps it can start to write the 404 errors to a file so we can view them? Or extend the display beyond just five or six lines so we can view them on the screen as it's running?

dwreski commented 4 years ago

It would also be good to be able to resume a session. My connection to the server dropped after having run fink for more than three hours. It's too bad that all has to be done over.

dwreski commented 4 years ago

It managed to make its way through a complete run without failing.

Do you have any recommendations on procedures to convert the JSON output into something more usable? Of course I could convert it into a CSV, but thought there might be something already set up to graph the data or display it in a more usable way?

kelunik commented 4 years ago

@dwreski You can use -o to write a complete log which you can filter while it's running. You can use jq for filtering and other things, but I'm not aware of a simple graphing solution I could recommend.

Please open a new issue for feature requests such as resumption.

The 404s might also be some crawler prevention kicking in.

kelunik commented 4 years ago

I've tagged https://github.com/amphp/http-client/releases/tag/v4.4.1, which fixes this issue.