datastax / dsbulk

DataStax Bulk Loader (DSBulk) is an open-source, Apache-licensed, unified tool for loading into and unloading from Apache Cassandra(R), DataStax Astra and DataStax Enterprise (DSE)
Apache License 2.0
85 stars 30 forks source link

Prevent insert queries from failing to parse when using "vector" as a column name #495

Closed absurdfarce closed 4 months ago

absurdfarce commented 5 months ago

Problem appeared to be that "vector" was added as a keyword but wasn't specified as an unreserved keyword. Added it to the parsers collection of native type names which (a) also makes it an unreserved keyword and (b) seems more correct on it's face anyway.

absurdfarce commented 5 months ago

Possibly useful for testing purposes:

CREATE KEYSPACE test
  WITH REPLICATION = { 
   'class' : 'SimpleStrategy', 
   'replication_factor' : 1 
  };

CREATE TABLE test.foo (
    i int PRIMARY KEY,
    vector vector<float, 3>
);

i,vector
1,"[8, 2.3, 58]"
2,"[1.2, 3.4, 5.6]"
5,"[23, 18, 3.9]"

select vector from test.foo order by vector ann of [3.4, 7.8, 9.1] limit 1;
absurdfarce commented 4 months ago

Thanks for the review @adutra; your points are well-taken.

I think I have a good working impl incorporating your suggestions... please take another look when you have a sec!