Swader / diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
MIT License
53 stars 20 forks source link

Bizarre issue with Diffbot using guzzlehttp #50

Open jonathantullett opened 7 years ago

jonathantullett commented 7 years ago

I've created a Crawl API job which has a few hundred results. I'm trying to get the results using type:article (so $bot->search("type:article") with setNum to "all") and it's throwing an exception:

PHP Warning:  curl_multi_exec(): Unable to create temporary file, Check permissions in temporary files directory. in /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php on line 106

Warning: curl_multi_exec(): Unable to create temporary file, Check permissions in temporary files directory. in /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php on line 106
PHP Fatal error:  Uncaught GuzzleHttp\Exception\RequestException: cURL error 23: Failed writing body (2749 != 16384) (see http://curl.haxx.se/libcurl/c/libcurl-errors.html) in /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php:187
Stack trace:
#0 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php(150): GuzzleHttp\Handler\CurlFactory::createRejection(Object(GuzzleHttp\Handler\EasyHandle), Array)
#1 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php(103): GuzzleHttp\Handler\CurlFactory::finishError(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHttp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))
#2 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(179): GuzzleHttp\Handler\CurlFactory::finish(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHttp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))
#3 /home/tullettj/websites/c in /home/tullettj/websites/core-code/lib/vendor/php-http/guzzle6-adapter/src/Promise.php on line 127

Fatal error: Uncaught GuzzleHttp\Exception\RequestException: cURL error 23: Failed writing body (2749 != 16384) (see http://curl.haxx.se/libcurl/c/libcurl-errors.html) in /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php:187
Stack trace:
#0 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php(150): GuzzleHttp\Handler\CurlFactory::createRejection(Object(GuzzleHttp\Handler\EasyHandle), Array)
#1 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php(103): GuzzleHttp\Handler\CurlFactory::finishError(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHttp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))
#2 /home/tullettj/websites/core-code/lib/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(179): GuzzleHttp\Handler\CurlFactory::finish(Object(GuzzleHttp\Handler\CurlMultiHandler), Object(GuzzleHttp\Handler\EasyHandle), Object(GuzzleHttp\Handler\CurlFactory))
#3 /home/tullettj/websites/c in /home/tullettj/websites/core-code/lib/vendor/php-http/guzzle6-adapter/src/Promise.php on line 127

So I've played with the setNum values and 60 seems to be the magic number. If I query for 60 or less, it's fine, however if I go for 61 or above, it throws this exception.

Have you seen this before, @Swader? It's a bit of a head scratcher (I have ~2Gb free in the temporary files directory)

Thanks!

jonathantullett commented 7 years ago

I've run it with a few other searches and the values are arbitrary. I thought it may be memory_limit related, but the script's configured with a memory_limit of -1 (so, unlimited).

Swader commented 7 years ago

That'll happen with large bodies :( See this and this.

Let me know if you manage to hack past it.

jonathantullett commented 7 years ago

@Swader I've been working around this so far by decreasing the number of results downloaded if there's an exception thrown.

However, I'm now starting to see it being thrown when only a single result (setNum(1)) is being requested. This is rather problematic. Can you think of any way around this, or do we just have to consider them bad searches?

Swader commented 7 years ago

@jonathantullett I'm sorry about the delay, didn't see this until now - I'll play around with it when I find time. It's still related to the above links from what I can tell, so I'll just have to modify the underlying stack to the tac method without implicitly relying on Guzzle to handle everything and it should work. This would, however, increase dependency on curl. I'll think about the best solution for everyone.

Swader commented 6 years ago

@jonathantullett to continue on our discussion from Support - how are you calling the hundreds of search calls? I think I may be misunderstanding what's going on, as I've been unable to reproduce the hung calls. Can you share your code?

jonathantullett commented 6 years ago

@swader this is a different issue. This one is replicated by trying to download setNum($XX) articles for a search (I use the min time on the search), and I see the problem on a number of searches - often related to the size of the pages being returned.

I’ll find a search which is showing the issue and post it later (not at home at the moment) but this is completely unrelated to the dangling HTTPS connection issue.

Swader commented 6 years ago

No I know, I just had no other way to ping you here directly 😬 A new issue with the hung calls would be appreciated