Open kiranzo opened 1 month ago
@kiranzo yes, the bintablefile doesn't support string data at all, as strings have variable length. The whole point of the bintablefile is fast access to records randomly through the file (like reading last records without having to read the whole file afront), so the records needs to have the fixed width, i.e. being made of only primitive data types with fixed size like ints, floats, booleans.
For storing records with Strings, I'd recommend Apache ORC, or Parquet.
@kiranzo yes, the bintablefile doesn't support string data at all, as strings have variable length. The whole point of the bintablefile is fast access to records randomly through the file (like reading last records without having to read the whole file afront), so the records needs to have the fixed width, i.e. being made of only primitive data types with fixed size like ints, floats, booleans.
For storing records with Strings, I'd recommend Apache ORC, or Parquet.
I tried ORC, and wow, it's really small on my data, compared to max compression parquet and feather, thank you for the suggestion. If variable length is a problem, represent strings as padded byte arrays, maybe? And add max length restriction as an obligatory field param.
I'm researching on different formats for storing table data, and I came across this one. When I wanted to test it, I got lots of Pydantic validation errors.
Errors:
So, it doesn't support string data at all?