RichWeber / rolling-curl

Automatically exported from code.google.com/p/rolling-curl
0 stars 0 forks source link

Better syntax for streaming file processing : huge XML files #21

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This module is really wonderful... it saved the day in my application, which 
otherwise I would had to move to a threading solution!

--------------------
However, the call syntax is not quite optimal for what I'm doing.  I am 
processing huge XML files that are too big to load in memory, and too big to 
completely process before starting the curl operation.  This syntax would be 
ideal:

<?php
require("RollingCurl.php");

function request_callback
($response, $info, $request, $callback_parameter) {
    ...
}

$rc = new RollingCurl("request_callback");
$rc->window_size = 20;
$rq = new RollingCurlRequest();
while( $xml = get_next_xml_element() ) {
    $rq->url($xml['url']);
    $rq->callback_parameter = $xml;
    $rc->execute_until_blocked($rq);  // Blocks if queue full
}
$rc->finish(); // Returns after last pending request is done
?>

Then I can maintain my streaming process, yet still stuff requests in to curl 
as fast as they will go.  Also note the extra parameter that gets passed to the 
callback.

Original issue reported on code.google.com by digitalbitstream@gmail.com on 9 Jul 2011 at 5:13

GoogleCodeExporter commented 9 years ago
As a step in that direction, example.php could show attaching some context data 
to the request object (so the callback knows what triggered the request):

$rc = new RollingCurl("request_callback");
$rc->window_size = 20;
$i = 0;
foreach ($urls as $url) {
    $request = new RollingCurlRequest($url);
    $request->extra_data = $i++;
    $rc->add($request);
}
$rc->execute();

Original comment by digitalbitstream@gmail.com on 11 Jul 2011 at 8:01

GoogleCodeExporter commented 9 years ago
Another use case for the above syntax

In your callback you see
   http-equiv="refresh"
And wish to queue the target page to load (the page you loaded is probably 
useless).

Original comment by digitalbitstream@gmail.com on 11 Jul 2011 at 9:08