Impetus / kundera

A JPA 2.1 compliant Polyglot Object-Datastore Mapping Library for NoSQL Datastores.Please subscribe to:
http://groups.google.com/group/kundera-discuss/subscribe
Apache License 2.0
903 stars 233 forks source link

Selecting maximum rows of data and preventing specific log from being triggered #712

Open cherry316 opened 9 years ago

cherry316 commented 9 years ago

Hello, good day.

I have 2 queries:

  1. Is it possible to fetch 1,000,000 rows of data from a Cassandra table? So far, I can only get 100,000 rows of data using the command below. If I set setMaxQuery to 200K to 1M, errors will appear in my console.

    Query query = em.createQuery("SELECT u from Users u"); List list = query.setMaxResults(100000).getResultList();

  2. Is it possible to prevent/forbid these logs below from appearing in my console, from being triggered? It seems that the process of fetching data from Cassandra to java is taking too long because of these. Let say if I fetch 50 rows of data, there will be 50 lines of such log will also appear in my console . . . . 09:22:35,795 INFO com.impetus.client.cassandra.CassandraClientBase Populating data for entity of clazz class entities.Users and row key 788349. 09:22:35,811 INFO com.impetus.client.cassandra.CassandraClientBase Populating data for entity of clazz class entities.Users and row key 995749. 09:22:35,811 INFO com.impetus.client.cassandra.CassandraClientBase Populating data for entity of clazz class entities.Users and row key 529979. 09:22:35,811 INFO com.impetus.client.cassandra.CassandraClientBase Populating data for entity of clazz class entities.Users and row key 563775. . . .

Any help will be much appreciated.

Best Regards, Cherry

karthikprasad13 commented 9 years ago

@cherry316 Hey,

1) Can you check if the errors are not because of Java heap size? And it would be helpful if you could share error logs.

2) You can change your settings in log4j.properties file from "log4j.rootLogger=INFO,stdout, file" to "log4j.rootLogger=FATAL,stdout, file".

HTH! -Karthik

cherry316 commented 9 years ago

hi @karthikprasad13 , Thanks for your reply.

For point 1, it is not Java heap space and here are the errors in the logs. Just a thought, in cqlsh , I need to add limit clause to view all the data I need like,

cqlsh:keyspace_name> select count(*) from entity_name limit 2000000;

Is it related to this errors?

    01:27:34,774 INFO  [com.impetus.kundera.persistence.EntityManagerImpl] (http-localhost-127.0.0.1-8083-1) Returning client instance for persistence unit cassandra_pu.
    01:27:34,821 INFO  [com.impetus.client.cassandra.query.CassQuery] (http-localhost-127.0.0.1-8083-1) Preparing index clause for query SELECT u from Users u
    01:27:51,508 ERROR [com.impetus.client.cassandra.thrift.ThriftClient] (http-localhost-127.0.0.1-8083-1) Error during executing find of column family users, Caused by: .
    01:27:51,508 ERROR [stderr] (http-localhost-127.0.0.1-8083-1) javax.persistence.PersistenceException: org.apache.thrift.transport.TTransportException: Frame size (121777733) larger than max length (16384000)!
    01:27:51,524 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at com.impetus.client.cassandra.thrift.ThriftClient.find(ThriftClient.java:964)
    01:27:51,524 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at com.impetus.client.cassandra.query.CassQuery.populateEntities(CassQuery.java:167)
    01:27:51,539 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at com.impetus.kundera.query.QueryImpl.fetch(QueryImpl.java:1089)
    01:27:51,539 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at com.impetus.kundera.query.QueryImpl.getResultList(QueryImpl.java:166)
    01:27:51,555 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at session.BasicCrud.getAllUserData(BasicCrud.java:58)
    01:27:51,555 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at CassServlet.service(CassServlet.java:57)
    01:27:51,555 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
    01:27:51,555 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:329)
    01:27:51,571 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
    01:27:51,571 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
    01:27:51,571 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
    01:27:51,571 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.jboss.as.jpa.interceptor.WebNonTxEmCloserValve.invoke(WebNonTxEmCloserValve.java:50)
    01:27:51,571 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153)
    01:27:51,586 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
    01:27:51,602 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
    01:27:51,602 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
    01:27:51,602 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368)
    01:27:51,602 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
    01:27:51,617 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:671)
    01:27:51,617 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:930)
    01:27:51,617 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at java.lang.Thread.run(Unknown Source)
    01:27:51,617 ERROR [stderr] (http-localhost-127.0.0.1-8083-1) Caused by: org.apache.thrift.transport.TTransportException: Frame size (121777733) larger than max length (16384000)!
    01:27:51,633 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.thrift.transport.TFastFramedTransport.readFrame(TFastFramedTransport.java:148)
    01:27:51,633 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.thrift.transport.TFastFramedTransport.read(TFastFramedTransport.java:134)
    01:27:51,633 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
    01:27:51,649 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
    01:27:51,649 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
    01:27:51,649 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
    01:27:51,664 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    01:27:51,664 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:802)
    01:27:51,664 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:786)
    01:27:51,664 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   at com.impetus.client.cassandra.thrift.ThriftClient.find(ThriftClient.java:910)
    01:27:51,680 ERROR [stderr] (http-localhost-127.0.0.1-8083-1)   ... 20 more
    01:27:51,680 INFO  [stdout] (http-localhost-127.0.0.1-8083-1) BasicCrud.countClassSize() took[millis]: 16875

    01:27:51,680 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/Cass_Implementation].[CassServlet]] (http-localhost-127.0.0.1-8083-1) Servlet.service() for servlet CassServlet threw exception: java.lang.NullPointerException
        at CassServlet.service(CassServlet.java:58) at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) [jboss-servlet-api_3.0_spec-1.0.0.Final.jar:1.0.0.Final]
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:329) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) [jbossweb-7.0.13.Final.jar:]
        at org.jboss.as.jpa.interceptor.WebNonTxEmCloserValve.invoke(WebNonTxEmCloserValve.java:50) [jboss-as-jpa-7.1.1.Final.jar:7.1.1.Final]
        at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:153) [jboss-as-web-7.1.1.Final.jar:7.1.1.Final]
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) [jbossweb-7.0.13.Final.jar:]
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:368) [jbossweb-7.0.13.Final.jar:]
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877) [jbossweb-7.0.13.Final.jar:]
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:671) [jbossweb-7.0.13.Final.jar:]
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:930) [jbossweb-7.0.13.Final.jar:]
        at java.lang.Thread.run(Unknown Source) [rt.jar:1.7.0_71]

Thanks, Cherry

karthikprasad13 commented 9 years ago

@cherry316

Sorry for late reply. We have looked into the issue and it is not because of not adding limit clause in the query. Kundera does that internally once you set MaxResults.

Issue is because of the limit on framed transport size (amount of data that can be sent at-a-time), which is set to 15mb by default in cassandra which results in error. However, its not recommended to change its value because of various performance issues.

Karthik.

cherry316 commented 9 years ago

Hi @karthikprasad13

I'm sorry for late reply as well. Your suggestion below doesn't work.

You can change your settings in log4j.properties file from "log4j.rootLogger=INFO,stdout, file" to "log4j.rootLogger=FATAL,stdout, file".

Also regarding the first query, I have read that there is variable I will change in cassandra.yaml.

Regards, Cherry

karthikprasad13 commented 9 years ago

@cherry316

Any updates?

You can add this file for configuring your logs.

Karthik.

cherry316 commented 9 years ago

hi @karthikprasad13,

I added the file you have suggested but still such logs exist on my server log.

...
06:00:22,539 INFO  [com.impetus.client.cassandra.CassandraClientBase] (http-localhost-127.0.0.1-8083-1) Returning cql query  INSERT INTO "LMNet"("protocol","frame","weight","halfDistance","bucket","state","speed","targetWeight","associatedLoader_id","equipApplication","azimuth","mainState","y","packet","accuracy_xy","hBucket","zLevel","healthMonitor","x","associatedDozer_id","license","messageGroup","type","mobile_id") VALUES(6,0x20413131313036303846433545323730354143333841433030303035303030303030303030303030303232303030303631454130303030,0.0,0,0,'STOP_EMPTY',0,60001.0,0,8,160,'A1',11286700,11,22,0,0,null,86466300,0,'trimouns','state group','dumper',6115) .
06:00:22,555 INFO  [com.impetus.client.cassandra.CassandraClientBase] (http-localhost-127.0.0.1-8083-1) Returning cql query  INSERT INTO "LMNet"("protocol","frame","weight","halfDistance","bucket","state","speed","targetWeight","associatedLoader_id","equipApplication","azimuth","mainState","y","packet","accuracy_xy","hBucket","zLevel","healthMonitor","x","associatedDozer_id","license","messageGroup","type","mobile_id") VALUES(6,0x2041313131303630383243353732373035464646423232414330303030374430364646464343434246303030363232303030303630454130303031,49100.0,0,0,'STOP_LOADED',0,60000.0,6,8,250,'A1',11280900,11,22,0,0,null,86464300,6,'trimouns','state group','dumper',6116) .
06:00:22,586 INFO  [com.impetus.client.cassandra.CassandraClientBase] (http-localhost-127.0.0.1-8083-1) Returning cql query  INSERT INTO "LMNet"("protocol","frame","weight","halfDistance","bucket","state","speed","targetWeight","associatedLoader_id","equipApplication","azimuth","mainState","y","packet","accuracy_xy","hBucket","zLevel","healthMonitor","x","associatedDozer_id","license","messageGroup","type","mobile_id") VALUES(6,0x204131313130363038413435433237303533343141414330303030394330364646464336414541303030363232303030303641454130303035,60010.0,0,0,'STOP_LOADED',0,60010.0,6,8,312,'A1',11278900,11,22,0,0,null,86465700,6,'trimouns','state group','dumper',6117) .
06:00:22,602 INFO  [com.impetus.client.cassandra.CassandraClientBase] (http-localhost-127.0.0.1-8083-1) Returning cql query  INSERT INTO "LMNet"("protocol","frame","weight","halfDistance","bucket","state","speed","targetWeight","associatedLoader_id","equipApplication","azimuth","mainState","y","packet","accuracy_xy","hBucket","zLevel","healthMonitor","x","associatedDozer_id","license","messageGroup","type","mobile_id") VALUES(6,0x2041313131303630383243353732373035464646423232414330303030374430364646464343434246303030363232303030303630454130303031,49100.0,0,0,'STOP_LOADED',0,60000.0,6,8,250,'A1',11280900,11,22,0,0,null,86464300,6,'trimouns','state group','dumper',6118) .
06:00:22,633 INFO  [com.impetus.client.cassandra.CassandraClientBase] (http-localhost-127.0.0.1-8083-1) Returning cql query  INSERT INTO "LMNet"("protocol","frame","weight","halfDistance","bucket","state","speed","targetWeight","associatedLoader_id","equipApplication","azimuth","mainState","y","packet","accuracy_xy","hBucket","zLevel","healthMonitor","x","associatedDozer_id","license","messageGroup","type","mobile_id") VALUES(6,0x204131313130363038413435433237303533343141414330303030394330364646464336414541303030363232303030303641454130303035,60010.0,0,0,'STOP_LOADED',0,60010.0,6,8,312,'A1',11278900,11,22,0,0,null,86465700,6,'trimouns','state group','dumper',6119) .
06:00:22,648 INFO  [stdout] (http-localhost-127.0.0.1-8083-1) Decoder.txtFileReader() : txt file to LMNet = OK ! took[milliseconds]: 162234 , 6120 data were added.

Cherry

idofmrsandeep commented 6 years ago

Hi cherry316,

I am not able to read >3000 rows from cassandra, I end up with read time out error with these many records. Can you please let me know do we need to increase read time out value? If so can you please help me with the value to what we can extend this value.

2)we have a requirement of reading millions and billions of data from cassandra, can you please help me with sample code if you have any ?

karthikprasad13 commented 6 years ago

Hi @idofmrsandeep

You can find sample project here

If you are already using Kundera, you should be able read > 3000 rows without any additional configuration changes. Please share logs

-Karthik

cherry316 commented 6 years ago

Hi Karthik,

Thank you for your response but this issue was solved long time ago.

Have a good day

Cherry Ardillos ​System Support Manager/System Developer Logimine http://www.logimine.com/ - Your partner in operations control

On Wed, Mar 28, 2018 at 4:10 PM, karthikprasad13 notifications@github.com wrote:

Hi @idofmrsandeep https://github.com/idofmrsandeep

You can find sample project here https://github.com/impetus-opensource/Kundera/blob/trunk/examples/basic-examples/downloadables/kundera-cassandra-example.zip?raw=true

If you are already using Kundera, you should be able read > 3000 rows without any additional configuration changes. Please share logs

-Karthik

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Impetus/Kundera/issues/712#issuecomment-376799735, or mute the thread https://github.com/notifications/unsubscribe-auth/AFPrrg051XEGp5gG814TxGJEIG17qXCbks5ti0WEgaJpZM4Dn58n .

idofmrsandeep commented 6 years ago

Hi Karthik,

Thank you for your response. My intention is I need to read 1 Millions records from cassandra, using Kundera. I am getting timeout error from Kundera. Can you share the sample program which can be used to read 1 million records.

karthikprasad13 commented 6 years ago

Hi @idofmrsandeep

You can use pagination, check the below code:

    Query query = (com.impetus.kundera.query.Query) em.createQuery("select p from Person p", Person.class);
    query.setFetchSize(1000000);

    Iterator<Person> iter = query.iterate();

    while(iter.hasNext()){
        Person p  = iter.next();
        LOGGER.debug("result sysout: " + p.getPersonId());
    }

-Karthik

idofmrsandeep commented 6 years ago

Hi Karthick,

Thank you for your response. But we are facing issue with connection timeout. at max we are able to setFetchSize to 3000. If we give more kundera is throwing timeout Exception. I increased read_request_timout value in cassandra to 50 Minutes, but it is of no use. Can you please tell me what and all read timouts I need to configure in both Kundera and in cassandra.

Your help is highly appreciable.

karthikprasad13 commented 6 years ago

Hi Sandeep,

Did you try with the above code? I am able to fetch 1 million rows with default Cassandra configurations.

-Karthik

idofmrsandeep commented 6 years ago

Karthik,

I think your cassandra is allowing you to query those many records. where my cassandra is throwing request timeout error. can you tell me have you configured any time outs ? Can you please give me your read_request_timout values and other time out values from your cassandra.yaml file. Also are you configuring any client side timeout values other than "socket.timeout", max.wait ?

karthikprasad13 commented 6 years ago

I did not change any timeout values from default. Following are from my cassandra.yaml file:

read_request_timeout_in_ms: 5000
request_timeout_in_ms: 10000

Which version of Cassandra are you using?

-Karthik

idofmrsandeep commented 6 years ago

Hi Karthik,

I am using cassandra 3.11.1 And I am using 9042 port. [cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]

In my cassandra table I have 1 Million rows, and from cql shell if i say count(*) from table I am geting below error.

ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out ceived only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}