Closed xevix closed 4 years ago
giraffez load
uses the multiload protocol (via Teradata's C++ TPT API). This protocol is most useful when dealing with a very large amount of data vs. ODBC or the CLIv2 with insert statements, however, Teradata server does not support LOB column types as seen here in the official documentation. In the case that you are working with LOB column types you will have to use something like giraffez insert
which uses Teradatas multi-statement mode to batch and send inserts. I've not dealt with LOB column types before so I'm unsure if this will work naively but is worth a shot (if a row of your data is greater than 64k, it will most likely fail). If it doesn't end up working, I think it could be explored to see how Teradata wants to insert LOB content and some changes could be made to giraffez that might facilitate LOB column types.
Ah, I see. Thanks for the swift and detailed reply, that's unfortunate load does not support LOB.
Is multi-statement mode less performant than multiload? If so, then I was thinking to go about this use case in a 2-step process:
In general, you cannot import or export LOB JSON data using FastLoad, MultiLoad, or FastExport protocols using either the legacy stand-alone load tools or Parallel Transporter. However, you can use these utilities to load or unload JSON data in the following cases: • You can load JSON data if it is less than 64 KB, and the target table defines the column as CHAR or VARCHAR. • If you set the transform group for the user to use a transform group that converts JSON to/from VARCHAR or VARBYTE.
NEW JSON()
to convert the varchar over.This isn't particularly ideal, but I'm hoping it should allow a way forward. Unfortunately I'm not too well versed in the C++ TPT API so I'm not sure what changes could be made in giraffez that would make the general process easier.
Yes, multi-statement mode is far less performant than multiload. It is essentially a parallel stream of insert statements vs multiload which is a binary protocol designed specifically for insert/update performance.
I'm not familiar with some of Teradata's newer features. It appears that the newer versions of their libraries may support LOB data types with the Update driver (aka multiload), but still having the limitation of only being able to support individual row sizes less than 64K. So if you are ok with that limitation it should work with giraffez if you have the appropriate version for both Teradata server and the local libraries you have installed. What version of Teradata server are you using and what version of the CLIv2 and TPTAPI do you have installed? The TPTAPI seems to have started support for the JSON data types and LOB data types for the Update driver in version 15.10. For background on what these libraries are, there is short description here.
Using giraffez CLI load command to put data in a table with a JSON column I get this error.
Looking online it seems like if I were using ODBC I could make this go away by adding this:
Unfortunately I can't see how to pass connection string options to Giraffez. Is there a way to do this? Or is the JSON column not supported?
Thanks!