duckdb / duckdb-r

The duckdb R package
https://r.duckdb.org/
Other
111 stars 22 forks source link

How to deal with `IO Error: Cannot open file...used by another process`? #56

Closed daranzolin closed 4 months ago

daranzolin commented 8 months ago

I've started implementing duckdb at work using {targets} and RStudio. However, between a couple coworkers, various workflows are persistently ground to a halt with:

Last error: rapi_startup: Failed to open database: IO Error: Cannot open file "...": The process cannot access the file because it is being used by another process.

We've tried being extra vigilant by always connecting with:

  con <- DBI::dbConnect(
    duckdb::duckdb(),
    dbdir = ...,
    read_only = TRUE
  )

And always disconnecting with:

DBI::dbDisconnect(con, shutdown = TRUE)

But when where we need to update tables in the database and we connect with read_only = FALSE, the 'used by another process' error always appears and we can't ascertain where there's a lingering connection, even after disconnecting, clearing our environments, and restarting R. Is this possibly an RStudio thing? Any guidance on managing connections would be greatly appreciated.

daranzolin commented 8 months ago

Probably related to https://github.com/duckdb/duckdb/issues/7826

daranzolin commented 8 months ago

Update: I get the same error connecting to the database with DBeaver, so it's probably unrelated to this package. The only thing that's worked so far is my coworkers and I closing all of our RStudio sessions, so now I'm thinking it's a duckdb-RStudio thing.

Tmonster commented 7 months ago

Hi daranzolin,

Duckdb recently merged a PR that adds information about what process is using the duckdb file. Don't know if it helps though.

https://github.com/duckdb/duckdb/pull/9988

daranzolin commented 7 months ago

@Tmonster thanks for letting me know! Does updating duckdb the R package include the updates to the duckdb source repo?

Tmonster commented 7 months ago

Hi daranzolin,

The current duckdb R package does not have this feature yet. There is a new Duck release planned soon, which will prompt a new duckdb-r package soon after. You can build a developmental build of the duckdb-r package by following these instructions if you are curious about the new functionality. You can let me know here if you have any questions

https://github.com/duckdb/duckdb-r/?tab=readme-ov-file#building

nicki-dese commented 7 months ago

I would just like to add that we've come across this error too, despite also being vigilant about disconnecting with DBI::dbDisconnect(con, shutdown = TRUE). I too have found the only workaround is to quit R studio entirely. We would also like to use duckdb in a targets pipeline. It has occurred for us in two completely different environments (my local work machine, and a VM)

krlmlr commented 5 months ago

This looks much better in #124, I'll merge it today, binaries will be available on https://duckdb.r-universe.dev/duckdb# soon. Can you confirm?

nicki-dese commented 5 months ago

Hi @krlmlr - unfortunately it will be difficult for me to confirm until a release is published on CRAN, because of the two environments I work in. Hopefully the OP can test.

krlmlr commented 4 months ago

@nicki-dese @daranzolin: Are you still seeing this with v0.10.1?

daranzolin commented 4 months ago

Thanks @krlmlr! Trying to test this now, but ironically I can't update/install it because of the same error:

* installing *source* package 'duckdb' ...
ERROR: cannot remove earlier installation, is it in use?
* removing '.../AppData/Local/R/win-library/4.2/duckdb'
* restoring previous '.../AppData/Local/R/win-library/4.2/duckdb'
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
  problem copying ...\AppData\Local\R\win-library\4.2\00LOCK-duckdb\duckdb\libs\x64\duckdb.dll to ...duckdb.dll: Permission denied

I'll try shutting everything down and restarting later this week.

krlmlr commented 4 months ago

Thanks, David. Windows loves locking files...

nicki-dese commented 4 months ago

Thanks so much for following up @krlmlr - I've had time to test in one of the two environments (fairly quickly, and not using targets), and it seems to have stopped occurring! :)

Having said that - what is the expected behaviour if you try to connect to the same database twice, both with read_only = F, i.e.:

con <- dbConnect(duckdb(), dbdir = "quack.duckdb", read_only = FALSE)

My expectation would be an error with the second attempt, because duckdb only allows one write connection to avoid conflicts.

It used to give you the The process cannot access the file because it is being used by another process. error posted above, but now in v0.10.1 it doesn't error (or give a warning) at all.

Let me know if I can help further.

krlmlr commented 4 months ago

Thanks, @nicki-dese. Two read-write connections from the same R session are supported, connecting from different R sessions should raise an error. Can you confirm?

nicki-dese commented 4 months ago

Hi @krlmlr - yes, if I open another R session by opening a 2nd instance of R studio, it does give me an error.

A very sensible error:

Error: rapi_startup: Failed to open database: {"exception_type":"IO","exception_message":"File is already open in \nC:\\Program Files\\RStudio\\resources\\app\\bin\\rsession-utf8.exe (PID 17432)"}

I think something a bit weird is going on between connecting to a database with read_only = T, disconnecting, and then going back and connecting with read_only = F. It doesn't seem to update the connection type, even if you delete the connection object, and then try connecting with read_only = F.

I have attached an excel file showing the testing I've done - let me know if that format doesn't work for you. duckdb_connection_testing_results.xlsx

krlmlr commented 4 months ago

Thanks. Yes, I think this is a consequence of the "read-onlyness" being a property of the driver object (created by duckdb()) and not of the connection. This is why opening two read-write connections to the same file in the same session works.

In this scenario, what happens if you add one or two gc() after disconnecting?

This is discussed in https://github.com/duckdb/duckdb-r/issues/83#issuecomment-2016836198, let's continue there.

krlmlr commented 4 months ago

Thanks for the update in #83. I was confused, #126 is the canonical issue. I'll also direct all traffic from #83 there.