SoftInstigate / restheart

Rapid API Development with MongoDB
https://restheart.org
GNU Affero General Public License v3.0
805 stars 171 forks source link

Continually insertion massive data is very slow? #67

Closed hxrain closed 8 years ago

hxrain commented 8 years ago

Hi: I have tested the insertions of several million data,Starts quickly, afterward was slow. Every so often, inserts a data to use over 40 seconds!! At this time, directly uses the mongodb API insertion is quick. What situation is this? The test code is as follows:

import java.io.IOException;
import org.apache.commons.codec.binary.Base64;
import org.apache.http.HttpResponse;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.HttpPut;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;

public class RestheartSubmiter {
    private static org.apache.http.client.HttpClient client4 = null;
    private static org.apache.http.impl.conn.PoolingHttpClientConnectionManager conMgr = null;
    private static String yourCollectionName = "CLTX";
    private static String url = "http://192.168.138.249:20080/ivms/";
    static {
        conMgr = new org.apache.http.impl.conn.PoolingHttpClientConnectionManager();
        conMgr.setMaxTotal(20);
        conMgr.setDefaultMaxPerRoute(conMgr.getMaxTotal());
        RequestConfig defaultRequestConfig = RequestConfig.custom()
                .setSocketTimeout(5 * 1000)
                .setConnectTimeout(5 * 1000)
                .setConnectionRequestTimeout(5 * 1000)
                .setStaleConnectionCheckEnabled(true)
                .build();
        client4 = HttpClients
                    .custom()
                    .setConnectionManager(conMgr)
                    .setDefaultRequestConfig(defaultRequestConfig)
                    .build();
    }
    public static HttpPut getHttpPut(String tbName, String docid) 
    {
        HttpPut pum = new HttpPut(url + tbName + "/" + docid);
        pum.addHeader("Authorization","Basic " + Base64.encodeBase64String(("a:a").getBytes()));
        pum.addHeader("Content-type", "application/json");
        return pum;
    }

    public static void main(String[] args) throws Exception {
        HttpPut pm = null;
        HttpResponse hr = null;
        String id = null;
        String json = "{\"aqdzt\":\"0\",\"chdsj\":\"1\",\"cjsj\":\"2015-11-01 00:00:01\",\"cllx\":\"K33\",\"clpp\":\"1\",\"clsd\":\"90\",\"csdbh\":\"999\",\"csys\":\"9\",\"cwkc\":\"3\",\"dhzt\":\"1\",\"gxsj\":\"2015-11-01 00:00:01\",\"hphm\":\"测A12345\",\"hpwz\":\"X111Y111\",\"hpys\":\"1\",\"hpzl\":\"01\",\"id\":\"c7cf733c-9f37-4ef2-889f-e44a6103c95c\",\"jgsj\":\"2015-11-01 00:00:01\",\"jllx\":\"1\",\"qmtp\":\"d:/一二三四五上山打老虎老虎没打着打到小松鼠一二三四五上山打老虎老虎没打着打到小松鼠一二三四五上山打老虎老虎没打着打到小松鼠.jpg\",\"sbbh\":\"999\",\"sbzt\":\"0\",\"tplx\":\"0\",\"wzlx\":\"1\",\"xscd\":\"1\",\"xsfx\":\"1\",\"zybzt\":\"1\"}";
        JSONObject jobj = JSON.parseObject(json);

        while (true) {
            id = java.util.UUID.randomUUID().toString();
            jobj.put("_id", id);

            pm = RestheartSubmiter.getHttpPut(yourCollectionName, id);
            StringEntity sre = new StringEntity(jobj.toJSONString(), "UTF-8");
            pm.setEntity(sre);
            long s = System.currentTimeMillis();
            hr = client4.execute(pm);
            long e = System.currentTimeMillis();
            // if the restheart response time great than 3000 mills print log
            if ((e - s) > 3*1000)
                System.out.println(e + "\tthe restheart PUT using time :" + (e - s) + "mills");
            pm.releaseConnection();
            EntityUtils.consume(hr.getEntity());
        }
    }

}
ujibang commented 8 years ago

I run a test using your code, creating one million documents, having modified the code to print out the partial times each 10.000 documents.

I couldn't reproduct your problem. All partial times are around 22 seconds and I only got one request taking 4 seconds (you slow request log).

You probably run in a CPU saturation or network issue. Take care that the test client, mongodb and RESTHeart processes can saturate the CPU when run on the same computer.

Can you repeat you code tracing the CPU utilization?

Memory shouldn't be a problem. In my case, the RESTHeart process took only about 140Mbyte.

However this does not happen on my laptop: during test execution, the RESTHeart process took no more than 20% of the CPU, indicating that it isn't under stress. This is because the test code is single threaded. If you check the performance test result page in the documentation, you'll find a test that creates one million documents using 200 threads and it takes 250 secs (vs 244 secs of using the mongoldb driver directly).

Here the logs of my test:

Running org.restheart.test.performance.RestheartSubmitter
took 0.135 secs for 10000 documents
took 31.002 secs for 10000 documents
took 21.375 secs for 10000 documents
took 22.367 secs for 10000 documents
took 21.3 secs for 10000 documents
took 20.935 secs for 10000 documents
took 20.929 secs for 10000 documents
took 20.978 secs for 10000 documents
took 21.255 secs for 10000 documents
took 22.047 secs for 10000 documents
took 21.14 secs for 10000 documents
took 20.585 secs for 10000 documents
took 20.632 secs for 10000 documents
took 21.524 secs for 10000 documents
took 20.609 secs for 10000 documents
took 20.53 secs for 10000 documents
took 21.175 secs for 10000 documents
took 21.04 secs for 10000 documents
took 21.099 secs for 10000 documents
took 21.252 secs for 10000 documents
took 21.195 secs for 10000 documents
took 20.699 secs for 10000 documents
took 21.153 secs for 10000 documents
took 20.851 secs for 10000 documents
took 20.864 secs for 10000 documents
took 20.897 secs for 10000 documents
took 20.844 secs for 10000 documents
took 21.129 secs for 10000 documents
took 20.707 secs for 10000 documents
took 20.928 secs for 10000 documents
took 21.19 secs for 10000 documents
took 21.073 secs for 10000 documents
took 20.665 secs for 10000 documents
took 23.004 secs for 10000 documents
took 21.274 secs for 10000 documents
took 23.235 secs for 10000 documents
took 21.878 secs for 10000 documents
took 21.759 secs for 10000 documents
took 21.952 secs for 10000 documents
took 21.998 secs for 10000 documents
took 21.007 secs for 10000 documents
took 21.17 secs for 10000 documents
took 21.543 secs for 10000 documents
took 21.357 secs for 10000 documents
took 21.044 secs for 10000 documents
took 21.386 secs for 10000 documents
took 22.891 secs for 10000 documents
took 21.97 secs for 10000 documents
took 21.72 secs for 10000 documents
took 21.37 secs for 10000 documents
took 21.424 secs for 10000 documents
took 21.287 secs for 10000 documents
took 21.218 secs for 10000 documents
took 21.356 secs for 10000 documents
took 21.472 secs for 10000 documents
took 21.527 secs for 10000 documents
took 21.72 secs for 10000 documents
took 21.619 secs for 10000 documents
took 21.386 secs for 10000 documents
took 21.834 secs for 10000 documents
took 21.453 secs for 10000 documents
took 21.497 secs for 10000 documents
1447839518142   restheart PUT using time: 4691 mills
took 26.102 secs for 10000 documents
took 21.258 secs for 10000 documents
took 21.927 secs for 10000 documents
took 21.353 secs for 10000 documents
took 21.402 secs for 10000 documents
took 21.686 secs for 10000 documents
took 21.477 secs for 10000 documents
took 21.713 secs for 10000 documents
took 21.444 secs for 10000 documents
took 21.469 secs for 10000 documents
took 21.501 secs for 10000 documents
took 21.59 secs for 10000 documents
took 21.621 secs for 10000 documents
took 21.521 secs for 10000 documents
took 21.414 secs for 10000 documents
took 21.326 secs for 10000 documents
took 21.906 secs for 10000 documents
took 21.545 secs for 10000 documents
took 21.579 secs for 10000 documents
took 21.959 secs for 10000 documents
took 21.397 secs for 10000 documents
took 21.838 secs for 10000 documents
took 21.554 secs for 10000 documents
took 21.158 secs for 10000 documents
took 21.974 secs for 10000 documents
took 21.728 secs for 10000 documents
took 21.194 secs for 10000 documents
took 21.721 secs for 10000 documents
took 21.185 secs for 10000 documents
took 21.301 secs for 10000 documents
took 21.499 secs for 10000 documents
took 21.8 secs for 10000 documents
took 21.536 secs for 10000 documents
took 21.69 secs for 10000 documents
took 21.793 secs for 10000 documents
took 21.691 secs for 10000 documents
took 21.649 secs for 10000 documents
took 21.483 secs for 10000 documents
hxrain commented 8 years ago

Please continually running for 12 hours later has a look again.

caarto commented 8 years ago

when we take a long time running, it appares regular cycle, it took a 70s a cycle for nomal insert,and it time out for a while , now we start testing on a new computer, waiting for my result.....

ujibang commented 8 years ago

@caarto any news from your tests?

caarto commented 8 years ago

thanks for your concern. during our test ,we found the mongodb database took the server resources for empty(Especially memory), and we change our test case.we run the mongodb driver insertion and restheart insertion the same time, the two test will slow the same time,and restheart will more slow...... the test case is cycling insert 10000 documents and none stop. here this our test log snapshot use mongodb driver during the different time. mongodbconsole.txt and here is the restheart log on the same time. restheartconsole.txt

if you need the entire log ,give me your email....

ujibang commented 8 years ago

So you run the test for about 32 hours, getting with RESTHeart an average of 52 tsp (transaction per seconds).

this are very low numbers!

for performance test you have to make sure that:

just for an idea, with 200 threads we reached 3990 write tsp (reference https://softinstigate.atlassian.net/wiki/x/gICM) on quite low end server.

Of course, with mongoldb shards and horizontal scalability on restheart layer you can reach much better results!

In summary, if you need to manage an massive stream of data you need to build up an appropriate infrastructure.

If you need further support, you have to use restheart professional services (http://restheart.org/support-packages.html), since this would be off topic here.

hxrain commented 8 years ago

Can confirm that not restheart problem. Mongodb In the case of small memory (4G), insert the huge amounts of data, the write speed will slow.

ujibang commented 8 years ago

Hello @hxrain

Thanks for reporting this.

ps for your issue, have a look at https://docs.mongodb.org/manual/faq/storage/#what-are-memory-mapped-files

hxrain commented 8 years ago

Thank you!