ccb-hms / nhanes-database

3 stars 4 forks source link

Change download / ETL process to directly insert tables via DBI interface #178

Open nathan-palmer opened 4 months ago

nathan-palmer commented 4 months ago

The problem with this approach is that DBI::dbAppendTable(...) does RBAR inserts, which will make a mess of the transaction log. The reason we have been writing delimited files to disk is that we can then BULK INSERT them (in SQL Server, other DBMS have similar tools), thus avoiding the transaction log for everything but structural changes (eg page extent addition) to the table.

rgentlem commented 4 months ago

would it then make sense to write the translated files out to disk then? I am not sure what the advantage to avoiding the transaction log is...so can't weigh the value vs the cost

On Thu, Mar 14, 2024 at 1:10 PM Nathan Palmer @.***> wrote:

The problem with this approach is that DBI::dbAppendTable(...) does RBAR inserts, which will make a mess of the transaction log. The reason we have been writing delimited files to disk is that we can then BULK INSERT them (in SQL Server, other DBMS have similar tools), thus avoiding the transaction log for everything but structural changes (eg page extent addition) to the table.

— Reply to this email directly, view it on GitHub https://github.com/ccb-hms/NHANES/issues/178#issuecomment-1997942221, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7TWAZ5ARZ22WUCB5COSGLYYHKX7AVCNFSM6AAAAABENAALRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXHE2DEMRSGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Robert Gentleman @.***

nathan-palmer commented 4 months ago

Yes exactly. NathanOn Mar 14, 2024, at 1:33 PM, Robert Gentleman @.***> wrote: would it then make sense to write the translated files out to disk then?

I am not sure what the advantage to avoiding the transaction log is...so

can't weigh the value vs the cost

On Thu, Mar 14, 2024 at 1:10 PM Nathan Palmer @.***>

wrote:

The problem with this approach is that DBI::dbAppendTable(...) does RBAR

inserts, which will make a mess of the transaction log. The reason we have

been writing delimited files to disk is that we can then BULK INSERT them

(in SQL Server, other DBMS have similar tools), thus avoiding the

transaction log for everything but structural changes (eg page extent

addition) to the table.

Reply to this email directly, view it on GitHub

https://github.com/ccb-hms/NHANES/issues/178#issuecomment-1997942221,

or unsubscribe

https://github.com/notifications/unsubscribe-auth/AC7TWAZ5ARZ22WUCB5COSGLYYHKX7AVCNFSM6AAAAABENAALRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXHE2DEMRSGE

.

You are receiving this because you are subscribed to this thread.Message

ID: @.***>

--

Robert Gentleman

@.***

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

deepayan commented 4 months ago

We should look into whether this is an issue for the RPostgres version of dbAppendTable(); in particular, see the copy argument in https://rpostgres.r-dbi.org/reference/postgres-tables.html.

If I read this correctly, this should do a COPY instead of multiple INSERTS by default, which seems to be the Postgres recommendation for this situation (https://www.postgresql.org/docs/current/populate.html).