Open tmvoe opened 4 years ago
Keyspace default
is not available in Cassandra, using sciencedb
.
The fourth and (to a lesser extent) fifth points seem to be the most severe.
Cassandra offers a variant of a secondary index that might offer better performance / more functionality (e.g. the LIKE
operator): https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
As of right now, the following points have been completed:
Indices and searches
Allow filtering to authorized users
Introduce a cassandra
storage handler
The point Where to store the new cassandra
models? was changed insofar as the directory structure of the models is now different: The cassandra models can now be found in models/cassandra
.
SQL
data model templatesThe templates themselves have been copied, but the transformation is still in progress. Right now there is still a problem with readAllCursor
in the DDM case, but that should be tentatively resolved at the end of today. After that, some functionality (e.g. deletion in a loop) still needs to be tested.
The keyspace (in this case sciencedb
since default
seems to not be allowed as a name) and the tables (including db_migrated
) are created. To create the keyspace, the following steps were taken:
sh
file was copied from the Cassandra container and modified according to https://stackoverflow.com/a/42698847 cql
file with the necessary information for creating the keyspace (for now named cassandra-keyspace.cql
) is copied by docker-compose
to the docker-container in the initialization directory and executed by the aforementioned change in the entrypoint sh
. The tables are created by migration files (see the migration template)
cassandra
data models and adaptersThe storageHandler
is created by models/index.js
. Unfortunately, this handler is also required to test the connection to Cassandra during the startup of the servers, leading to repeated messages about the models being loaded. Apparently this is only a cosmetic problem, so this still happens. What has not yet been done is providing configuration of the Cassandra driver via a JSON file, so that existing Cassandra servers can not yet be accessed.
A bug that was only discovered today seems to require a change of the schema in case of DDM files. The problem is as follows:
In case of DDMs the function readAllCursor
(see above) needs to collect records from different cursors and provide a list of entries formed from them. In case of SQL, a custom ordering can be provided, but Cassandra doesn't allow for that unless someone is willing to use up a lot of additional storage space by duplicating the tables for each column that ordering should be possible for. Instead, Cassandra provides its own sorting going by the token (a hash function) of the ID value. To provide proper pagination, the full list of records needs to be ordered in the same way (otherwise, a sorting cursor would have no meaning). The token itself can only be generated by Cassandra itself. For the local adapter, this is no problem, since the token can be queried alongside the model attributes. But the remote adapter can only send a query according to the GraphQL schema. So the token needs to be added to the schema. Unfortunately, the token now looks like just another attribute of the schema, and can also be accessed by the end user (e.g. in GraphiQL). It should be determined if it is possible to avoid this problem.
The following text is from Integrate Cassandra Support into Cenzontle
In order to implement the new storage type
cassandra
the following steps suggest themselves.Cassandra Setup / simple migration
First, setup a "sandbox" environment with Docker using the latest stable Cassandra Docker image and add it to our sandbox setup. Write a simple Cassandra Query Language script
setup_cassandra_db.js
to setup the Cassandra database for your needs, i.e. create aKEYSPACE default
and allTABLE
s required for your data models. Add an additional tableCREATE TABLE db_migrated ( migrated_at timeuuid PRIMARY KEY )
which will hold only a single row if and only if thesetup_cassandra_db.cql
has been executed, i.e. the last thing the script does is insert a row into the db_migrated table and thus mark the DB as ready for usage. The first thing thesetup_cassandra_db.js
script does is to check whether the database has been migrated already. If and only if the tabledb_migrated
does not exists and does not contain a valid row, the setup procedure will be executed. Extend the shell script used to start up the backend GraphQL-server to (i) wait for the Cassandra server to be available and (ii) run the migration-script.Information about how to obtain automatically generated
timeuuid
s can be found here.Indices and searches
Note that in the context of Cenzontle it is highly recommendable to create indices on all table columns in order to enable exhaustive searches.
Introduce the new storage type
cassandra
Copy
SQL
data model templatesMake two new storage types available:
cassandra
data models andcassandra-adapter
distributed data model (DDM) adapters. Copy the code-generator templates that are responsible for generating the data model layer modules forSQL
data models. Use the document in which theSQL
statements are defined that implement the respective data model layer functions and translate them toCQL
which is very alike toSQL
.Allow filtering to authorized users
Introduce a new authorization check in all read operations that execute database searches. Currently, only
read
will be checked on a given data model name or adapter name. Addsearch
to the checks. If the user hassearch
authorization appendALLOW FILTERING
to the generatedCQL
queries.Introduce a
cassandra
storage-handlerCurrently, Cenzontle's
models_index.js
script initializes the data models, including their connection to relational databases through Sequelize.Where to store the new
cassandra
models?Create a new directory
models-cassandra
in which Node.js modules for data models of the new storage typecassandra
will be stored.Initialize
cassandra
data models and adaptersProvide a JSON configuration file in
./config/cassandra.json
that has all data needed by the Node.js Cassandra Driver. Adjustmodels_index.js
to read in this config, initialize the Driver and set it in eachcassandra
data model module class as a constant propertystorageHandler
. See this post for how to define class constants in ECMA Script 6. Basically use something like the following snippetThe above has the advantage that both on the Person class level as well on the level of Person records (instances) the storage handler is available: