alexz-enwp / wikitools

Python package for working with MediaWiki wikis
105 stars 51 forks source link

very high lag time: possible causes? #34

Open gg4u opened 8 years ago

gg4u commented 8 years ago

Hello,

I'm experiencing real high lag time. I even hit the message ('Server lag, sleeping 14 seconds').

Could you please suggest which could possible reasons be ?

I am just running this test from console:

def search_wikipedia_random():
    site = wiki.Wiki("https://en.wikipedia.org/w/api.php") 
    params = {
        'action':'query', 
        'list':'random',
        'rnnamespace' : 0,
        'rnfilterredir' : 'all' ,
        'rnlimit' : 1,
        'redirects' : '',
        'format' : 'json',
        }
    request = api.APIRequest(site, params)
    result = request.query()
    return result['query']['random'][0]['id']

import time

start = time.time()
search_wikipedia_random()
end = time.time()
print(end - start)

and got

16.8418629169 13.1237468719

!

I am not having problems in browsing, so I don't think is problem of the line.. (right now listening to youtube and doing stuff and rassodocks in the evening :) ) I wonder if I could be lagged out for not having configured something (headers?) or if I m missing something.

mzmcbride commented 8 years ago

I took your script, added the two necessary import lines, and ran it a few times. Here's what I got:

mzmcbride@gonzo:~$ ./wikitools-issues-34.py 
1.04104304314
mzmcbride@gonzo:~$ ./wikitools-issues-34.py 
0.576555013657
mzmcbride@gonzo:~$ ./wikitools-issues-34.py 
0.634619951248
mzmcbride@gonzo:~$ ./wikitools-issues-34.py 
3.55164194107
mzmcbride@gonzo:~$ ./wikitools-issues-34.py 
0.607800960541
mzmcbride@gonzo:~$ ./wikitools-issues-34.py 
2.19808292389
mzmcbride@gonzo:~$ ./wikitools-issues-34.py 
0.659627914429
mzmcbride@gonzo:~$ ./wikitools-issues-34.py 
1.16318798065
mzmcbride@gonzo:~$ ./wikitools-issues-34.py 
0.809925079346
mzmcbride@gonzo:~$ ./wikitools-issues-34.py 
0.596135139465
mzmcbride@gonzo:~$ ./wikitools-issues-34.py 
0.724714040756

You can view lag at https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb=.

If I do time curl "https://en.wikipedia.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=1&rnfilterredir=all&redirects=&format=json" a few times, I get about 0.3 seconds. Setting up the Wiki() object probably accounts for the additional overhead. Your code looks fine. You could set &rnlimit= to a higher value to get more random pages in a single query. If you're getting 14 seconds of server lag... I'm not sure what's causing that. The production server admin log (https://wikitech.wikimedia.org/wiki/Server_Admin_Log) doesn't indicate that lag has been high lately.

gg4u commented 8 years ago

hi @mzmcbride thank you for tip for viewing lag - I tried move Wiki() object as global, to declare only once. Now I m in another network, cannot run same test.

An additional information: I see that a few times there are spikes over the second, going from 1, 2, even 4s; I m running this as a single test, I wonder if lag time would increase or be more frequent if using wikitool() in an api for public use, with more requests.

Is lag time depending by number of connections coming from a domain? I am would like to use full-text search queries as entry point for a site, so list=search and generator=search module.