brianmhess / cassandra-loader

Delimited file loader for Cassandra
Apache License 2.0
197 stars 93 forks source link

Does not work with Cassandra 3.0 #25

Closed gw0 closed 8 years ago

gw0 commented 8 years ago

Because Cassandra 3.0 changed some internal tables and older versions of drivers try to access them, both tools crash when trying to connect. Please update the Java driver.

# ./cassandra-unloader -f xxx.csv -host 1.2.3.4 -schema xxx.foo
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /1.2.3.4:9042 (com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table schema_keyspaces))
        at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:223)
        at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78)
        at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1272)
        at com.datastax.driver.core.Cluster.init(Cluster.java:158)
        at com.datastax.driver.core.Cluster.connect(Cluster.java:248)
        at com.datastax.loader.CqlDelimUnload.setup(CqlDelimUnload.java:329)
        at com.datastax.loader.CqlDelimUnload.run(CqlDelimUnload.java:350)
        at com.datastax.loader.CqlDelimUnload.main(CqlDelimUnload.java:444)
xqchen1 commented 8 years ago

Could we get a update on this? I would like to try Cassandra 3.0 if cassandra-loader is compatible with 3.0. I loaded data using this tool in Cassandra 2.1.5 and it's very fast.

al3xandru commented 8 years ago

@xqchen1 can you please try this branch https://github.com/al3xandru/cassandra-loader/tree/issue-25 and report back? thanks

xqchen1 commented 8 years ago

Thanks Alex. I finally got a chance to test the Cassandra Loader with C* 3.3 on a new cluster. It's working great!!!!!

Thanks again for all your help.

ghost commented 8 years ago

how fast is it

al3xandru commented 8 years ago

@hzliang if your question is about how fast the loader is, there are way too many variables to take into account to talk any numbers. It starts with the size of your cluster, the level of parallelism the machine running the loader can handle, the size of each "record", and it goes all the way to tuning different parameters on both server side and the loader.

ghost commented 8 years ago

Thank you very much for your replying.my cassandra2.2.0 and six nodes,but i found 2000rows/min when using cassandra-load,it is too slow for me to load 3T data.来自我的华为手机-------- 原始邮件 --------主题:Re: [brianmhess/cassandra-loader] Does not work with Cassandra 3.0 (#25)发件人:Alex Popescu 收件人:brianmhess/cassandra-loader 抄送:hzliang @hzliang if your question is about how fast the loader is, there are way too many variables to take into account to talk any numbers. It starts with the size of your cluster, the level of parallelism the machine running the loader can handle, the size of each "record", and it goes all the way to tuning different parameters on both server side and the loader.

—You are receiving this because you were mentioned.Reply to this email directly or view it on GitHub

xqchen1 commented 8 years ago

I loaded 62 million rows of data to a table with 150 columns. The load rate was about 8200 rows per second on a 4 node cluster with 16 cores (32 logical)/SSDs/264GB memory. In a small cluster of 4 VM nodes , my load rate was about 1000 rows per second with 1 thread. You want to create smaller CSV files so you can run your job in parallel. If compaction is falling behind, you may want to lower number of threads. You need to monitor where is the bottleneck. It's a tuning process.

brianmhess commented 8 years ago

I didn't close this issue, but yes we support 3.0 now.