NVlabs / FPSci

Aim Training Experiments
Other
70 stars 23 forks source link

use parameterized sqlite for `insertRowIntoDB` #385

Open jspjutNV opened 2 years ago

jspjutNV commented 2 years ago

Opening a PR to consider parameterized SQL queries for insertion to avoid injection problems from either experiment configuration, or user responses for text entry. This approach should allow characters like ' to be entered into the database without error. An alternative approach (#386 ) would be to translate those characters into something else before adding them to the non-parameterized queries.

Parameterized queries are a well known best practice for all types of SQL, though sqlite's C interface changes significantly when doing so.

This one is exclusive with #386 (only one or the other should be merged).

This closes #383

bboudaoud-nv commented 2 years ago

I'll note that this approach (as currently implemented) loses the types for our data in favor of treating everything as text. If we wanted to retain the types we could potentially do some (internal) logging when tables are created so we know the right types. Still I like the approach of using these methods ahead of the current approach.

Last but not least, only either this PR or #386 should be merged as they solve the same problem differently.

jspjutNV commented 2 years ago

I'll note that this approach (as currently implemented) loses the types for our data in favor of treating everything as text. If we wanted to retain the types we could potentially do some (internal) logging when tables are created so we know the right types. Still I like the approach of using these methods ahead of the current approach.

I'm pretty sure types could be preserved with a bit more work on this. You can specify types, it's just that our general interface doesn't expose those quite as easily.

Still, you could do a dynamic type case statement to specify each type correctly. If someone gets some time, that would be a nice improvement to the current PR.

The other required improvement before merging this is to apply the same style to any other table write functions there may be.

jspjutNV commented 2 years ago

I should note that the way all sql helpers work in top of tree is to write only strings/text to the database, so this style of change is not a regression from what we're doing already. Obviously we could fix it to submit the correct types to the database, but we've never done that before. I believe that fix could be done after merging this since it fixes a more important bug.

That said, insertRowsIntoDB still needs to be updated, and we need to do some testing to make sure this PR is working.

jspjutNV commented 2 years ago

0953f9e fails to log because there's a variable limit on sqlite and we exceed it. I'm working on breaking the command down into smaller size sub-commands to fit within the limit.

jspjutNV commented 2 years ago

This PR is nearly matched to the behavior of master but fixing #383.

There is still one place where the old style sqlite interface is being used in FPSciLogger::updateSessionEntry where a 1-off update is sent. We should decide whether to update that to the v2 interface or let it stay until the bigger fix where we pass data types all the way through.

jspjutNV commented 2 years ago

If we run into performance issues in the future, we may want to look here for whether changing from SQLITE_TRANSIENT to SQLITE_STATIC is a good idea. It does require tracking when the strings are destroyed to use the static version.

jspjutNV commented 2 years ago

@bboudaoud-nv found a possible bug in this PR and I saw it too. A bunch of nulls start showing up in a few of the tables. We need to investigate more before merging this.

bboudaoud-nv commented 2 years ago

I also think that having some insert statements logged to log.txt is less valuable now that the value fields are logged as the ? characters from the INSERT query rather than literals.

jspjutNV commented 2 years ago

I also think that having some insert statements logged to log.txt is less valuable now that the value fields are logged as the ? characters from the INSERT query rather than literals.

I agree. Looks like I accidentally left my debugging prints in there. Let's remove them before merging and get just the most useful subset of log prints (if any).

Continuing to investigate, @bboudaoud-nv has reported seeing some SQL_BUSY return codes in the failing case, which makes it seem like we're hitting race conditions. This may be a good thing suggesting that we're increasing the concurrency of our submits to the database. However, we'll need to do the work to ensure we retry the failed queries.

There's an interesting blog post here that may help with an understanding of what's going on, and a few suggested ways to retry and fix our problem.