ReactiveCouchbase / ReactiveCouchbase-core

Core library for ReactiveCouchbase
Apache License 2.0
64 stars 28 forks source link

View Response in write-intensive loads #47

Closed behrad closed 10 years ago

behrad commented 10 years ago

I'm evaluating Couchbase in a scenario which I have about 500 writes/second (using set to store new requests). For each key, a few seconds later, I should call a view (so it can't be stale), and then do an update (get to fetch the document and then set with updated document) on that key.

The problem is I constantly see Couchbase indexing that view, and my view result futures are called back very late... a minute after or so.... in high load. Am I miss using API? Any configuration/tune ups?

P.S. I'm a heavy CouchDB user, and had no problems in such a case with that.

mathieuancelin commented 10 years ago

I think it's normal for Couchbase to reindex the view often as you write a lot of content, so the view try to index any new content.

However, it should not block queries.

Can you show me the way you perform your queries to the view and the query itself (even with fake values, names, etc ...) ?

behrad commented 10 years ago

When setting:

bucket.set[Message](message.uuid.get, message)

on an external event after a few seconds:

val query = new Query().setReduce(false).setStale( Stale.FALSE ).setKey( JsString(status.Id.toString).toString() )
      store.bucket.rawSearch("mt", "byId")(query).headOption.map {
        case Some(row) =>
          store.update( row.value, { message =>
            message.copy( status = Some(message.status.getOrElse(Array()) ++ Array(status.getStatusMesssage)) )
          }).onFailure {
            case status => log.info( s"Couldn't update $status" )
          }
        case None => log.error( s"Message not found with tId='${status.Id}' or view 'mt/byId' not up-to-date" )
      }

Where store.update is as this:

override def update(messageId: String, updateFunc: (MTMessage) => MTMessage): Future[Any] = {
    for {
      doc <- bucket.get[MTMessage]( messageId/*, CouchbaseExpirationTiming_byDuration( 2 seconds )*/ )
      updated <- {
        bucket.replace[MTMessage](messageId, updateFunc( doc.get ))
      }
    } yield updated
  }

and I'm getting some Message not found with.... msg after minutes... which shows that some of my set operations not executed yet or view is stale!

it should not block queries

It is normal to block when requesting non-stale views, however I wouldn't expect Couchbase any latencies in that scale.

mathieuancelin commented 10 years ago

Okay, I think the problem comes from ˋquery.setStale(Stale.FALSE)ˋ. It means that you will wait to get the result of the query until every pending documents are indexed. Try with ˋStale.OKor ˋStale.UPDATE_AFTER.

http://docs.couchbase.com/couchbase-manual-2.0/#index-updates-and-the-stale-parameter

behrad commented 10 years ago

I know this @mathieuancelin I can't call for non-stale (stale/update_after) views, since I should lookup the latest written docs! with update_after I may not find my key in view results!

mathieuancelin commented 10 years ago

Oh, sorry I did not understood that need.

Then I guess you can't really do anything. This is the eventually consistent part of Couchbase views.

Did you try to ask Couchbase guys about it ?

behrad commented 10 years ago

I've posted to Couchbase google group the same topic. I'm not sure if BULK operations would benefit. I've no problems with Couchdb in that rate :)

chenbekor commented 10 years ago

maybe you should rethink your model. if u share some information about your use case we can try helping you out with a different design without requiring views.

behrad commented 10 years ago

consider a message gateway/proxy who is receiving many hundred routing requests per second, it should store requests + status updates... for each request system will update different fields of that document (Couchbase has no update handlers unfortunately like CouchDB does updates in-place) Calling a view is for the need that, we got updates based on document.routeId field not their keys (uuid's in my case). So I wrote a view to map document.routeId -> UUID so that I could reverse lookup my key from routeId, then update that key with the status message.

I see these should help: 1) Using bulk operations 2) Do operations on memory, and write late, at end of updates, even BULK.

but they both would lead to more complex coding of my app, I would like to depend on Couchbase to do heavy persistence for me (like Redis, CouchDB, ...) I don't see how could I re-design my app using Couchbase, or using different APIs to achieve this @chenbekor :)

chenbekor commented 10 years ago

i would like to help more, I suggest we keep solutions aside for a while until i get a clear understanding of the problem domain ;)

so i do understand the gateway/proxy thing but the request updates part is a bit vague, can you elaborate a bit more, what exactly is the flow?

behrad commented 10 years ago

for any request, I store it in Couchbase with a local UUID, I then call an external REST service, which in turns returns me a ticketId (routeId), I update that document (UUID) with the ticketId in the external service resposne. after a while, I'm notified with a (routeId, statueMessage) pair, so I should update the document with right UUID associated with that routeId and store the latest statusMessage.

mathieuancelin commented 10 years ago

You can always store the routeId -> UUID directly in a bucket when the external service returns, so you will be able to find the right document and update it later.

You will need 2 round trip to do that but that's always better than using a view

behrad commented 10 years ago

I'd though about that, yea @mathieuancelin I had fever of putting couchbase under stressed writes to see if it feels like CouchDB, but it didn't, I thought may be my API usage is wrong or so and may be somebody has experienced this before... I think I should at last accept the space overhead for using an auxiliary bucket for id associations, the unclear point for me is the efficiency of database maintenance. Should I clear docs? compact frequently... Should I store data monthly in separate partitioned buckets...
there are about a few hundred million requests per month