SqliteModernCpp / sqlite_modern_cpp

The C++14 wrapper around sqlite library
MIT License
903 stars 156 forks source link

unnecessary calls to sqlite3_{column,value}_bytes{,16} in getting text #200

Closed marciso closed 4 years ago

marciso commented 4 years ago

in functions inline void get_col_from_db(database_binder& db, int inx, std::string & s), inline void get_val_from_db(sqlite3_value *value, std::string & s) and corresponding std::u16string variants, there is a call to the function to get text length, before getting the text, but the result of this call is not used.

I have commented these calls (for performance reason), the code still works as expected.

zauguin commented 4 years ago

Please see the dev version, the code is changed there to respect the returned value. This is very useful for databases which might contain NULL bytes in string values. Also it enhances performance by not requiring a strlen while constructing the string.

marciso commented 4 years ago

Right, yes, I was initially a bit concerned about the NUL in the output but, AFICT, this can only happen when converting from blobs (you could argue this is API misuse: use std::vector instead for this case). By looking at the sqlite source code, you are right that in most of the cases there will be performance gain from calling sqlite3_value_bytes over strlen (although when conversion happens from a different type, this may invoke malloc...; also not sure for small strings.).

Briefly looked at the dev implementation for these methods. Spotted small problem with the sequence points: I think you want to explicitly call sqlite3_value_text before sqlite3_value_bytes, e.g.

auto * aux = sqlite3_value_text(...);
return std::string(aux, sqlite3_value_bytes(...));

As per documentation to sqlite, conversion to text may happen on the call to sqlite3_value_text, then following sqlite3_value_bytes will take the length of the converted value. Otherwise, you might end up measuring the length of a different beast...

zauguin commented 4 years ago

Right, yes, I was initially a bit concerned about the NUL in the output but, AFICT, this can only happen when converting from blobs (you could argue this is API misuse: use std::vector instead for this case).

This can happen with strings, but for such strings expressions might behave in odd ways:

If a non-negative fourth parameter is provided to sqlite3_bind_text() or sqlite3_bind_text16() or sqlite3_bind_text64() then that parameter must be the byte offset where the NUL terminator would occur assuming the string were NUL terminated. If any NUL characters occur at byte offsets less than the value of the fourth parameter then the resulting string value will contain embedded NULs. The result of expressions involving strings with embedded NULs is undefined.

By looking at the sqlite source code, you are right that in most of the cases there will be performance gain from calling sqlite3_value_bytes over strlen (although when conversion happens from a different type, this may invoke malloc...; also not sure for small strings.).

Briefly looked at the dev implementation for these methods. Spotted small problem with the sequence points: I think you want to explicitly call sqlite3_value_text before sqlite3_value_bytes, e.g.

auto * aux = sqlite3_value_text(...);
return std::string(aux, sqlite3_value_bytes(...));

As per documentation to sqlite, conversion to text may happen on the call to sqlite3_value_text, then following sqlite3_value_bytes will take the length of the converted value. Otherwise, you might end up measuring the length of a different beast...

Good catch, please create a PR.

aminroosta commented 4 years ago

Merged the relevant PR. (https://github.com/SqliteModernCpp/sqlite_modern_cpp/pull/201)

Thank you all, please reopen if any issue.