Small modification to concatentate two fields

Hi @acscott thanks for reaching out.

So far DSBulk has avoided transforming input data; iow it's an ETL without the T :-)

There has been requests to introduce the ability to transform data on-the-fly. However the code you pointed at would not be the right place to do that as this is happening inside a connector, which in this case is responsible solely for reading the input file and emitting records.

The right place to do that would be at the core of DSBulk's engine, where we could imagine a transformer function Record -> Record that would transform the contents of each individual record before they are persisted to the database.

The function body could be provided in a scripting language, then compiled to Java bytecode on-the-fly. Most likely we'd need to sandbox the execution context as it must execute extremely fast, and have no side effects such as disk or network I/O. Also, we'd need to come up with a nice way to initialize any persistent state required by the function.

This would certainly be a nice addition to DSBulk. But I don't think the team has bandwidth for implementing that as of today unfortunately.

The general guidance that we give our users is to instead modify the input data to match your tables. This is generally easy to achieve with command-line tools such as awk or sed.

datastax / dsbulk

Small modification to concatentate two fields #455