Open eddiejaoude opened 11 years ago
(1) Save the crawls into a text file
php batch_start
correct example should be
php batch_start /var/www/httparchive/httpdocs/bulktest/run.txt IE8 0 BBC-Run1
(3) for the label use the same unique ID as the file name
(4) Correct response should be "DONE submitting batch run"
(5) caveat #1 - this is "single threaded" in the sense that currently only one batch can run at once - if you call batch_start when there is already a batch running you clobber the previous batch it seems... there is supposed to be a check for this but it doesn't appear to be working...
(6) caveat #2 - nothing actually happens until you call batch_process.php (repeatedly every x minutes) until the batch finishes. We need to setup chron jobs for this.
(7) caveat #3 - even after the batch has finished processing you then to call updatestats.php for the data to be calculated and added to the database.
just to clarify the syntax -
php batch_start
The WPT Location ID basically needs to be IE8 upper-case, even though it will be invoking IE9 (this is a bug I think but we'd need to check all the HTTPArchive code to ensure that any location ID is correctly parameterised)
import save flag - use 0 for now
sub-label - needs to be unique or nothing happens!!!
https://code.google.com/p/httparchive/source/browse/trunk/bulktest/README.txt
The description of included files:
bootstrap.inc: Configure the environment of execution batch_lib: The collection of all the functions needed by batch testing batch_start: Start a new batch testing batch_process: Peform all the tasks of a batch testing
How to make the batch running?
a) run "php batch_start" to kick off a new batch testing. It will detect whether there is a batch testing running in the system. If there is, it will kill it. It will read the input URL file, create the MySQL tables if necessary and the corresponding records. It will also print a summary of the previous batch testing before starting a new batch.
b) run "php batch_process" repeatly to perform a single batch testing. In each run, the script forks some subprocesses each of which is in charge of the tests in a specified status and try to move all the tests in this status to the next step. Once upon a completion of running, a summary of the batch will be printed. This script also guarantees that there is no other instance running when it starts. If there is, it exits.
To automate the whole periodic batch testing, you could schedule batch_process.php to run hourly in cron - if there's nothing to do it just exits. batch_start.php could be triggered manually or scheduled in cron to run every 2 weeks or whatever the interval for testing would be.
caveat #1 - this is "single threaded" in the sense that currently only one batch can run at once - if you call batch_start when there is already a batch running you clobber the previous batch it seems... there is supposed to be a check for this but it doesn't appear to be working...
Is this still the case? Only one batch at a time? This is worse than 'single threaded' if by running another independent thread causes issues with another.
Yes, as it stands right now only one batch can run at once unless we write a patch for HTTPArchive.
It’s on Souder’s “to do” list…
Basically there is a table which contains all the information about the running batch and I don’t think that has a key unique to that batch so there is no way to distinguish batch 1 from batch 2 (alternatively you could create a unique temp table per batch).
Ditto when you calculate the stats – I am not sure if it just processes “all the results that are there” as opposed to “everything for a run”.
Send crawled URLs to HTTP Archive