[🔊] Decouple table creation from row insertion

The current pandas_sqlite.InsertOrReplaceRecords (as well as the underlying pandas.io.sql methods) take a data frame as input and:

if the table doesn't exist yet, it's schema is inferred from the input data frame to create the table before proceeding to insert rows in it; or
otherwise the new rows are inserted into the already existing table.

This works well, except when we try to involve multiprocessing. There is a race condition in which several child processes may detect that the table does not yet exist, and all of them try to create it at once.

To solve this we should:

On the parent process, if it doesn't exist already, create a new (empty) table.
All of the child processes only add rows to an existing table.

To implement 1, however, we also need to explicitly tell pandas the expected types for each column (to build the needed "empty" frame), instead of letting it guess the types out of data returned from dashboard API calls. Which is probably a good thing anyway.

@zeptonaut

catapult-project / catapult

[🔊] Decouple table creation from row insertion #4442