Overhead does not decrease on MongoDB Cluster for read operation

vreniers commented 9 years ago

Hi,

I've performed various benchmarks on a local node system (with MongoDB and YCSB running on the same machine). And a MongoDB cluster using 9 nodes (1 router server, 3 config servers and 5 database shards) using no ReplicaSets but sharding enabled on the collection to provide even distribution of the objects across my nodes.

When evaluating the performance of this abstraction layer, there is one key assumption that I make: I assume that the overhead induced by the abstraction layer is constant per operation. On a local machine, there is no network delay or packet travel time. Since the overhead in theory should remain constant, the usage of Kundera should have a lower relative overhead as the total runtime increases due to network delay.

In my observations however, I concluded that the overhead does not decrease as expected for the read operation. In fact it increases slightly compared to the read overhead on the local machine. The overhead however does decrease for write, read-update and update operations by a fair percentage.

There is no doubt in my results as I have taken a very large sample size. And I'm talking about reading in various steps: from 100K to 1.000K records from a large data set.

Is there any explanation possible as to why the overhead for read would not decrease in a similar fashion for the cluster? For my setup I use a single threaded execution, with cache clear after 1000 operations. Can it have something to do with the usage of transaction?

devender-yadav commented 9 years ago

@vreniers
This overhead for read should not increase in the cluster due to the abstraction layer. Are you comparing the cumulative time for each operation performed via Kundera with/without cluster setup and observing a lag?

vreniers commented 9 years ago

@devender-yadav

I'm comparing the runtime for 1 million reads on the cluster and the local node with and without kundera. I have taken a lot of samples with YCSB and executed this read-workload many times.

These are the results I have:

Read local node Kundera: ~ 91 seconds Read local node Native MongoDB API: ~ 84 seconds Overhead: 10%

Read cluster Kundera: 492 seconds Read cluster native MongoDB API: 413 seconds Overhead: 19%

It is not what I would expect. I assume that the overhead from a read operation in Kundera is constant. When using the cluster, the overhead should remain constant for each operation, the overall runtime increases and as such the relative overhead should decrease in comparison to the MongoDB native API.

I'm using transactions and a single thread in YCSB. Any ideas on what causes this behavior?

devender-yadav commented 9 years ago

@vreniers

Can you please share some points for more clarification:

persistence.xml settings for Kundera mongo unit.
Logging level enabled for Kundera and Native YCSB client?
Share code snippet of YCSB MongoDB client, Kundera used for connection handling.

-Devender

vreniers commented 9 years ago

Persistence.xml file:

<persistence-unit name="kundera-mongodb">
        <provider>com.impetus.kundera.KunderaPersistence</provider>
        <properties>
            <!-- <property name="kundera.nodes" value="192.168.145.168" /> -->
            <property name="kundera.nodes" value="localhost" />
            <property name="kundera.port" value="27017" />
            <property name="kundera.keyspace" value="kundera" />
            <property name="kundera.dialect" value="mongodb" />
            <property name="kundera.client.lookup.class"
                value="com.impetus.client.mongodb.MongoDBClientFactory" />
<!--            <property name="kundera.cache.provider.class" value="com.impetus.kundera.cache.ehcache.EhCacheProvider" />           -->
<!-- <property name="kundera.cache.config.resource" value="/ehcache-test.xml" /> -->
<!--            <property name="kundera.pool.size.max.active" value="5" /> -->
<!--            <property name="kundera.pool.size.max.total" value="5" /> -->
            <property name="kundera.client.property" value="kunderaMongoTest.xml" />
        </properties>
</persistence-unit>

The kunderaMongoTest.xml contains read.prefence primary. However this should not matter, since I'm not using ReplicaSets. The connection pool was enabled during the benchmark, but since i'm only testing for a single thread in YCSB this should not influence performance. Each read has to wait for the next, so multiple connections can't be used. The same settings were used on the local node.

This is my read operation in Kundera.

EntityManager em = emf.createEntityManager();

User u = em.find(User.class, key);

if(amountOps++ % 1000 == 0)
      em.clear();

em.close();

The logging should be disabled. The YCSB MongoDB client is based on the one from your benchmark. I've modified it slightly though, to make it more up-to-date with the lastest driver.

devender-yadav commented 9 years ago

@vreniers

What is the logging level used with Kundera & native mongo client and what is the distribution: uniform or zipfian? It would be helpful in verifying the issue if you can share native client code. You can share code at kundera@impetus.co.in.

-Devender

Impetus / kundera

Overhead does not decrease on MongoDB Cluster for read operation #744