avinashjoshi / walkaround

Automatically exported from code.google.com/p/walkaround
Apache License 2.0
0 stars 1 forks source link

Need to cache current state of each slob in memcache #99

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Deploy full-text-search walkaround 
2. Import waves from google wave that have several participants and lots of 
text. (In my case 5 participants and one wave bot that doesn't function because 
I haven't implemented robot-apis. It has 44 entries in the wave and most are 
several paragraphs of text with multiple colors, font size and typeface)
3. Navigate through the wave with up and down arrows. Edit an entry. Add an 
entry. Etc.

What is the expected output?  What do you see instead?
I expected to be able to use walkaround as a free google app, but it is 
exceeding the maximum free "Datastore Read Ops" so my walkround instance is 
inaccessible for a lot of the day.

Only about 15 minutes of actual typing by one user over the span of an hour 
caused it to reach 0.05 Million Datastore Read Ops yesterday. Only about 5 
minutes of typing by one user caused 0.02 Million Datastore Read Ops today.

What browser and browser version are you using?  On what operating system? 
Firefox 11 on Windows 7

What URL does your browser show when the problem occurs?  Did you compile
walkaround on your machine, or are you using a public instance?

https://kwalkaround.appspot.com/

I compiled the instance on my Mac OSX 10.6.8 and deployed it to google's app 
engine, but the site is being accessed by other users from other machines 
through the website.

Please provide any additional information below.

This could be associated with Issue 7, but since Issue 7 has been labeled 
fixed, and those code changes were merged with the full-text-search branch, I 
thought this might be a different issue with the same symptoms. Also, maybe 
there are settings I could enable or change that would help, but I do not know 
what those would be.

Here is my Quota Detail information for Storage:

Storage
Datastore Write Operations 0.00 Million Ops     Okay
Datastore Read Operations 0.02 Million Ops  Okay
Datastore Small Operations 0.00 Million Ops     Okay
Datastore API Calls 4,287   Okay
Datastore Queries 1,306     Okay
Blobstore API Calls 0   Okay
Datastore Stored Data 0.01 GBytes   Okay
Blobstore Stored Data 0.01 GBytes   Okay
Data Sent to Datastore API 0.00 GBytes  Okay
Data Received from Datastore API 0.02 GBytes    Okay
Datastore Entity Fetch Ops 17,075   Okay
Datastore Entity Put Ops 470    Okay
Datastore Entity Delete Ops 73  Okay
Datastore Index Write Ops 1,347     Okay
Datastore Query Ops 1,311   Okay
Datastore Key Fetch Ops 295     Okay
Datastore Id Allocation Ops 1   Okay
Number of Indexes 3%3%  6 of 200    Okay
Prospective Search Subscriptions 0%0%   0 of 10,000     Okay
Logs Stored Data 0.05 GBytes    Okay 

Original issue reported on code.google.com by otter...@gmail.com on 12 Apr 2012 at 8:08

GoogleCodeExporter commented 9 years ago
You can use wavereactor.appspot.com instance to import waves - it has more 
quota.
I don't think that Google ever promised to provide free, non limited Walkaround 
instance to anyone.

Original comment by vega113 on 12 Apr 2012 at 9:26

GoogleCodeExporter commented 9 years ago
Thanks, I'll check that out. I wasn't really saying google promised it would be 
free, though. I just thought that a couple of friends using a walkaround 
instance a few times a day shouldn't put it over google's quota maximum. Unless 
google's quota is insanely low and this is expected behavior?

Original comment by otter...@gmail.com on 12 Apr 2012 at 9:31

GoogleCodeExporter commented 9 years ago
Per the rest of the issue text, the bigger concern is whether Walkaround is 
using excessive read operations along the same lines as  Issue 7 , I believe.   
Very small changes causing thousands of reads would likely impact project 
adoption, even (or especially) for users paying for quota.

Original comment by betterth...@gmail.com on 12 Apr 2012 at 9:32

GoogleCodeExporter commented 9 years ago
Can you go to /admin/appstats and check in the "RPC Stats" section which RPCs 
are being called too much (e.g. datastore_v3.Get), and which paths are calling 
them the most (e.g. POST /channel)?  Maybe just expand the top few RPCs and 
paste the output here.

Original comment by oh...@google.com on 12 Apr 2012 at 11:20

GoogleCodeExporter commented 9 years ago
Here is the list. None of those counts seems like much compared to the number 
of Datastore read Ops the Admin Console is listing. When I googled this issue 
to see if I could locate what was causing all the reads I did find some forum 
posts indicating that the RCP counts might not tell you every read that is 
happening, which is unfortunate, but maybe it will still show where the excess 
is coming from. Here is one of those links: 
http://stackoverflow.com/questions/7807529/how-many-datastore-reads-consume-each
-fetch-count-and-query-operations

memcache.Get    1366
POST /channel   1326
/wave   21
/photos 7
/   4
/contacts   4
/inbox  4
 memcache.Set   277
POST /channel   263
/wave   11
/   1
/photos 1
POST /_ah/queue/__deferred__    1
 datastore_v3.RunQuery  151
POST /channel   138
/wave   13
 datastore_v3.BeginTransaction  82
POST /channel   69
/wave   10
/   1
/contacts   1
/photos 1
 datastore_v3.Rollback  81
POST /channel   69
/wave   10
/   1
/photos 1
 datastore_v3.Get   74
POST /channel   69
/wave   3
/   1
/photos 1
 datastore_v3.Next  17
/wave   17

Original comment by otter...@gmail.com on 13 Apr 2012 at 12:15

GoogleCodeExporter commented 9 years ago
The appstats data is only stored in memcache, so any part or all of it can 
indeed go missing at any point.  But there's a good chance that it will point 
us in the right direction anyway.  Can you try your five minutes of typing 
again that cause 20,000 datastore ops, and tell us what appstats shows 
immediately after that?

Original comment by oh...@google.com on 13 Apr 2012 at 12:53

GoogleCodeExporter commented 9 years ago
Okay, I replicated my issue. I opened a new tab in Chrome on Windows 7 and 
logged into the admin console to see the datastore quota prior to starting 
typing in my wave:

Storage
Datastore Write Operations           0.01 Million Ops    Okay
Datastore Read Operations            0.11 Million Ops    Okay
Datastore Small Operations           0.00 Million Ops    Okay
Datastore API Calls          13,870  Okay
Datastore Queries            4,564   Okay
Blobstore API Calls          0   Okay
Datastore Stored Data            0.01 GBytes     Okay
Blobstore Stored Data            0.01 GBytes     Okay
Data Sent to Datastore API           0.00 GBytes     Okay
Data Received from Datastore API             0.15 GBytes     Okay
Datastore Entity Fetch Ops           105,798     Okay
Datastore Entity Put Ops             1,384   Okay
Datastore Entity Delete Ops          73  Okay
Datastore Index Write Ops            3,771   Okay
Datastore Query Ops          4,578   Okay
Datastore Key Fetch Ops          895     Okay
Datastore Id Allocation Ops          1   Okay
Number of Indexes        3%  6 of 200    Okay
Prospective Search Subscriptions         0%  0 of 10,000     Okay
Logs Stored Data             0.05 GBytes     Okay

Here is the datastore quota AFTER doing 5 minutes of typing in my wave:
Storage
Datastore Write Operations           0.01 Million Ops    Okay
Datastore Read Operations            0.13 Million Ops    Okay
Datastore Small Operations           0.00 Million Ops    Okay
Datastore API Calls          17,106  Okay
Datastore Queries            5,490   Okay
Blobstore API Calls          0   Okay
Datastore Stored Data            0.01 GBytes     Okay
Blobstore Stored Data            0.01 GBytes     Okay
Data Sent to Datastore API           0.00 GBytes     Okay
Data Received from Datastore API             0.17 GBytes     Okay
Datastore Entity Fetch Ops           122,766     Okay
Datastore Entity Put Ops             1,768   Okay
Datastore Entity Delete Ops          73  Okay
Datastore Index Write Ops            4,855   Okay
Datastore Query Ops          5,461   Okay
Datastore Key Fetch Ops          1,276   Okay
Datastore Id Allocation Ops          1   Okay
Number of Indexes        3%  6 of 200    Okay
Prospective Search Subscriptions         0%  0 of 10,000     Okay
Logs Stored Data             0.05 GBytes     Okay

You can see the "Datastore Read Ops" went up by 0.02 Million and the "Datastore 
Entity Fetch Ops" went up by about 17k.

And here is the RPC stat info you wanted for when this happens:

RPC Count
 memcache.Get   1158
POST /submitdelta   894
POST /channel   157
POST /taskqueue/postcommit  90
/wave   11
/contacts   2
/photos 2
/   1
/inbox  1
 datastore_v3.RunQuery  594
POST /submitdelta   290
POST /taskqueue/postcommit  265
POST /channel   32
/wave   7
 datastore_v3.BeginTransaction  426
POST /submitdelta   232
POST /taskqueue/postcommit  156
POST /channel   31
/wave   5
/contacts   2
 memcache.Set   342
POST /submitdelta   255
POST /channel   47
POST /taskqueue/postcommit  33
/wave   5
/contacts   1
POST /_ah/queue/__deferred__    1
 channel.SendChannelMessage 299
POST /submitdelta   299
 datastore_v3.Commit    260
POST /submitdelta   226
POST /taskqueue/postcommit  33
/contacts   1
 datastore_v3.Put   260
POST /submitdelta   226
POST /taskqueue/postcommit  33
/contacts   1
 datastore_v3.Next  232
POST /submitdelta   212
POST /taskqueue/postcommit  20
 datastore_v3.Rollback  166
POST /taskqueue/postcommit  123
POST /channel   31
POST /submitdelta   6
/wave   5
/contacts   1
 search.IndexDocument   128
POST /taskqueue/postcommit  128
 datastore_v3.Get   84
POST /taskqueue/postcommit  47
POST /submitdelta   34
/contacts   1
/wave   1
POST /channel   1
 logservice.Flush   38
 taskqueue.BulkAdd  25
 urlfetch.Fetch 8
 channel.CreateChannel  1
 memcache.Delete    1
 search.Search  1

Original comment by otter...@gmail.com on 13 Apr 2012 at 5:46

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Oh, also, what I typed in the wave ended up being 881 characters, not including 
any time I backspaced, if that helps.

Original comment by otter...@gmail.com on 13 Apr 2012 at 5:50

GoogleCodeExporter commented 9 years ago
Looking at the numbers:

RPCs related to entity fetch ops:

 datastore_v3.RunQuery  594
POST /submitdelta   290
POST /taskqueue/postcommit  265
POST /channel   32
/wave   7

 datastore_v3.Next  232
POST /submitdelta   212
POST /taskqueue/postcommit  20

 datastore_v3.Get   84
POST /taskqueue/postcommit  47
POST /submitdelta   34
/contacts   1
/wave   1
POST /channel   1

other datastore RPCs:

 datastore_v3.BeginTransaction  426
POST /submitdelta   232
POST /taskqueue/postcommit  156
POST /channel   31
/wave   5
/contacts   2

 datastore_v3.Commit    260
POST /submitdelta   226
POST /taskqueue/postcommit  33
/contacts   1

 datastore_v3.Put   260
POST /submitdelta   226
POST /taskqueue/postcommit  33
/contacts   1

 datastore_v3.Rollback  166
POST /taskqueue/postcommit  123
POST /channel   31
POST /submitdelta   6
/wave   5
/contacts   1

non-datastore RPCs:

 channel.SendChannelMessage 299
POST /submitdelta   299

 memcache.Get   1158
POST /submitdelta   894
POST /channel   157
POST /taskqueue/postcommit  90
/wave   11
/contacts   2
/photos 2
/   1
/inbox  1

 memcache.Set   342
POST /submitdelta   255
POST /channel   47
POST /taskqueue/postcommit  33
/wave   5
/contacts   1
POST /_ah/queue/__deferred__    1

 search.IndexDocument   128
POST /taskqueue/postcommit  128

 logservice.Flush   38
 taskqueue.BulkAdd  25
 urlfetch.Fetch 8
 channel.CreateChannel  1
 memcache.Delete    1
 search.Search  1

I realize now that I should have asked you to go to /admin and clear memcache 
first to reset these counters to zero before the experiment.

But, given the above numbers, it looks like most of the read ops are in the 
reconstruction of the wavelet state during submitdelta and post-commit.  You 
could edit webinf/flags.json and increase post_commit_action_interval_millis to 
e.g. 300000 to reduce the number of datastore ops consumed by post-commit. 
(This will also make waves take five minutes to show up in new participants' 
inboxes -- we should probably short-circuit AddParticipant similarly to how we 
treat RemoveParticipant.)

The long-term solution to this is probably to cache the current state of each 
wavelet in memcache to avoid the datastore accesses (this can also be used to 
make waves load faster).  It's not hard, but I'm not sure when I'll get around 
to it.  Export-related bugs are a higher priority at the moment.

Another solution would be to make affinity work with dynamic backends so that 
MutationLog.StateCache is more effective, increase 
post_commit_action_interval_millis, and short-circuit AddParticipant.  Caching 
slob states seems more useful overall though.

Original comment by oh...@google.com on 15 Apr 2012 at 4:35