druid-io / pydruid

A Python connector for Druid
Other
506 stars 194 forks source link

Fix backslash handling in rows_from_chunks #249

Open ahiijny opened 3 years ago

ahiijny commented 3 years ago

This should fix #242.

In the rows_from_chunks function, when deciding whether or not a quote character indicates the end of a string, it only checked the preceding character to see if it was a backslash or not. But it didn't check if that backslash was escaped or not.

So, if a row contains a string with a properly escaped backslash right before the closing quote (e.g. "\\"), the row chunker will think that the closing quote is escaped. So then the number of braces it sees in the rest of the input won't be balanced, and then it won't ever properly process the rest of the JSON.

I changed rows_from_chunks to keep track of escaped character state in a string to avoid this, and added a couple of unit tests to check for this scenario.