Closed gparlamas closed 2 months ago
So far I been using the below, is this the recommended / most efficient way?
struct Sample
{
int64_t data1;
int64_t data2;
int64_t data3;
};
std::ostream& operator << (std::ostream& str, const Sample& s)
{
str << "Sample:" << s.data1 << ' ' << s.data2 << ' '<< s.data3;
return str;
}
Client client(ClientOptions().SetHost("localhost"));
client.Execute("CREATE TABLE IF NOT EXISTS default.numbers (id UInt16, tm DateTime64(6), msg Array(UInt8)) ENGINE = Memory");
{
Block block;
auto buffer = std::make_shared<ColumnUInt8>();
auto id = std::make_shared<ColumnUInt16>();
auto tm = std::make_shared<ColumnDateTime64>(6);
auto blobArray = std::make_shared<ColumnArray>(std::make_shared<ColumnUInt8>());
auto blob = std::make_shared<ColumnUInt8>();
uint16_t counter{0};
auto AppendColumns = [&](Sample s)
{
id->Append(++counter);
auto now = std::chrono::system_clock::now().time_since_epoch();
auto micros = std::chrono::duration_cast<std::chrono::microseconds>(now).count();
tm->Append(micros);
uint8_t buffer[1024]{};
memcpy(buffer, &s, sizeof(Sample));
ArrayInput input(buffer, sizeof(Sample));
blob->LoadBody(&input, input.Avail());
blobArray->AppendAsColumn(blob);
};
AppendColumns(Sample{122, 111, 133});
AppendColumns(Sample{22, 11, 33});
AppendColumns(Sample{2, 1, 3});
block.AppendColumn("id" , id);
block.AppendColumn("tm", tm);
block.AppendColumn("msg", blobArray);
client.Insert("default.numbers", block);
}
client.Select("SELECT id, tm, msg FROM default.numbers", [] (const Block& block)
{
for (size_t i = 0; i < block.GetRowCount(); ++i) {
auto id = block[0]->As<ColumnUInt16>()->At(i);
auto tm = block[1]->As<ColumnDateTime64>()->At(i);
auto blob = block[2]->As<ColumnArray>()->GetAsColumnTyped<ColumnUInt8>(i);
std::cout << id << ' ' << tm << ' ';
std::cout << *reinterpret_cast<Sample*>(blob->GetWritableData().data()) << std::endl;
}
}
);
`
Hi @gparlamas, sorry for long reply.
First of all, 3.0.0 is not out yet, and there are no concrete plans for a release date. So please choose the most recent release, (as of now, v2.5.1, or just use master's head)
Second, your snippet looks about right, except for the Array creation. The easiest (and most performant) way is to use ColumnArrayT
type-aware wrapper:
auto blobArray = std::make_shared<clickhouse::ColumnArrayT<ColumnUInt64>>();
blobArray->Append(/*vector, or c-array (,or anything iterable with std::begin() and std::end() really) of items, */ buffer);
block.AppendColumn("msg", blobArray);
Also, you may want to use String
instead of Array
, if your data is some sort of binary -- that way it can be organized more effectively on sender/receiver side and maybe more convenient to work with in SQL (but that highly depends on WHAT kind of data that you have).
Hi @Enmk,
I rather use the latest/master, its been a while since your last official release so would prefer to use a version with latest improvements / fixes unless you think some of the recent changes are not battle tested / ready for prime time. Any particular reason you are not planning the release of 3.0.0 yet?
Thanks for suggesting ColumnArrayT, it's exactly what I need!
Regarding using String instead of Array, it crossed my mind to convert the binary msgs into hex or base64 and storing them as a String, but I rather not go down that path. The binary msgs won't be used interactively with SQL; they will be processed by another application that will load and cast them back to their original binary form.
You can perfectly store binary in String
, without encoding as hex (or any other).
As for 3.0.0 the main reason is basically not enough resources to push couple of important features/fixes.
Do you have any example how to do this? ColumnString::Append(std::string_view str) or ColumnString::LoadBody() ? I would like to avoid nasty casts if possible. In terms of performance, is there a big difference between ColumnArrayT & String?
There is Append
family of methods, that take a string that can have arbitrary binary data, including nulls (for every one except const char*
overload, obviously). Those methods are copying data into the column itself, which should be acceptable in most cases.
However, if you have some data which lifetime you can grantee to exceed one of ColumnString
instance usage, you may use ColumnString::AppendNoManagedLifetime
, which will just reference the value inside the column, without copying any memory on the client.
Regarding performance: from server standpoint, String
and Array
are somewhat similarly organized, so there should be no big difference. However, if you can avoid excessive copying (ColumnString::AppendNoManagedLifetime
), difference on client side might be considerable, depending on your use case.
And, by the way, LoadBody
, SaveBody
, and any other load-and-save methods on any of the columns are not expected to be directly used by library clients.
Yea I got the impression LoadBody
et al. are not really part of the public interface - I couldn't resist its zero copy semantics... ;-)
Thanks for helping @Enmk
Hey folks,
I am planning to use your library in order to ingest some time series data into Clickhouse. I have a couple of questions:
Thanks in advance, George