Closed maxfreu closed 4 months ago
It's a little hard to tell what you're trying to do; can you share some example code of what you would like to do or what you're currently doing and problems you're having? Having concrete code example to work with can help in answering your question.
Actually, my question was imprecise. It's more directed towards how blob data can be written without unnecessary copies.
# My data looks like this, just with 80 million rows:
data = [rand(UInt16, 10) for _ in 1:10]
# what I want is writing the data as contiguous blob (NOT serialized julia structs)
# I achieve this like so:
data2blob(v) = collect(reinterpret(UInt8, v))
df = DataFrame(:foo => data2blob.(data))
db = SQLite.DB("deleteme.sqlite")
SQLite.load!(df, db, "foo")
close(db)
The resulting file has a blob column with the correct data written to it. However, I'd like to avoid calling collect
on 1.6GB of data. But when I leave it away, like so:
df = DataFrame(:foo => reinterpret.(UInt8, data))
the julia types get serialized somehow before being written. This makes kind of sense, but then I can't read it into other programs anymore. Maybe it would be good to special-case reinterpret arrays of basic integer types somewhere in the code?
Yeah, that makes sense to me. It might just be that we're supporting Vector{UInt8}
, but could make it AbstractVector{UInt8}
to store as blobs.
Oh yes, relaxing to AbstractVector{UInt8}
is way better than specializing for reinterpret arrays. Where would that go?
Hi! I have a dataframe column containing vectors of 10 Int16s. I would like to save the vectors as 20 bytes of blob data. How can I do that? Right now I work around it by converting the reinterpreted chars to a string, but that has issues with null termination etc.