brianmhess / cassandra-loader

Delimited file loader for Cassandra
Apache License 2.0
197 stars 93 forks source link

Support JSON fields #33

Closed humbao closed 7 years ago

humbao commented 8 years ago

Can we have JSON fields to be inserted?

al3xandru commented 8 years ago

@humbao where do you get those JSON fields from? there is no JSON column type, so I'm wondering if this is the optimal route to take here.

humbao commented 8 years ago

Sorry I was unlear. I meant JSON inserts. Currently the code implements the field definition by the -schema flag which defines a fixed field list which corresponds exactly to a delimited data source.

Implementing the JSON insert would allow for flexible "schemaless" import ability.

Thus each line in the data source is a complete JSON object which also defines the field list and also allow for more complicated structures using the other collection data types.

Refer to: http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-2-json-support http://cassandra.apache.org/doc/cql3/CQL-2.2.html#insertJson

This doesn't have to be a part of this codebase(although much of the existing structure can be applied), it could be a derivative or standalone.

al3xandru commented 8 years ago

@humbao JSON import makes sense. (as a side note, I don't think it actually needs to be anymore line-based)

brianmhess commented 8 years ago

What you are suggesting here is loading JSON data - basically, a JSON parser followed by CQL insert. This project is more about delimited file loading. While I think the code could support both, the parsing bit is different. -schema is there for 2 reasons. First, it identifies the destination for the data (note that there is no -table or -keyspace option). Second, it lays out the order of the columns in the delimited file. While you wouldn't need the second, you'd need to do something about the first. I'd be happy to consider pulling that sort of thing into this code, but would need to think about how you specify things. Would it be a -format option (delimited versus JSON versus something else)? And each -format option would have different options for it (e.g., -schema for delimited, -table/-keyspace for JSON, etc).

brianmhess commented 7 years ago

JSON was added in v0.0.21.