Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.6k stars 982 forks source link

fread fails when warning is caught: "Previous fread() session was not cleaned up properly. Cleaned up ok at the beginning of this fread() call" #2904

Open slazicoicr opened 6 years ago

slazicoicr commented 6 years ago

The three lines below work as expected:

fread("will, work\njust, fine\nthank, you", header = FALSE, sep=",", sep2=",")
fread("fails, rather\nbadly, too\nbad", header = FALSE, sep=",", sep2=",")
fread("will, work\njust, fine\nthank, you", header = FALSE, sep=",", sep2=",")

The first and third line read just fine and the second line throws a warning.

The issue happens when the line that throws the warning is wrapped up in a tryCatch block

tryCatch({
  fread("fails, rather\nbadly, too\nbad", header = FALSE, sep=",", sep2=",")
}, warning = function(w) {
  conditionMessage(w)
})

fread("will, work\njust, fine\nthank, you", header = FALSE, sep=",", sep2=",")

Calling the last fread throws a warning message, even though it should work just fine:

Warning message:
In fread("will, work\njust, fine\nthank, you", header = FALSE, sep = ",",  :
  Previous fread() session was not cleaned up properly. Cleaned up ok at the beginning of this fread() call.

sessionInfo:

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C               LC_TIME=en_CA.UTF-8       
 [4] LC_COLLATE=en_CA.UTF-8     LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.2

loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4  
mattdowle commented 6 years ago

Thanks for reporting. options(warn=2) was anticipated and does not generate this warning. But trapping warning() via tryCatch() wasn't anticipated unfortunately. Something 'unknown' was anticipated, though, and that's the coping mechanism you're seeing: it cleans itself up upon next call and issues a good warning. It could have been a lot worse. I don't know of a way to know at R level whether tryCatch(..., warning=) has been specified as something that halts, or not. Any attempt at that is likely to messy and fragile.

One solution might be for DTWARN in fread.c to cache the warning(s) in a private buffer and then call R/Python's warning() on exit after freadCleanup().

st-pasha commented 6 years ago

what if we surround the entire R fread(...) call into one big tryCatch(..., finally=freadCleanup) call?

mattdowle commented 6 years ago

That's a neat idea! Would be adding tryCatch(..., finally=.Call(CfreadCleanup)) around the .Call(CfreadR) here I guess : https://github.com/Rdatatable/data.table/blob/master/R/fread.R#L101

dhersz commented 3 years ago

Hello. I'm running into this same problem when trying to catch the "parsing-problems-related" warning messages in a custom file-reading function I'm creating. I've seen other packages using readr::problems to do this, but I'm trying to stick to data.table on this.

Do you have any suggestions on how to do this while staying away from this warning? I'll post the issue I created on my own repo below to exemplify the error I'm facing.

You can see that this line: gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip") is what causes the warning to be thrown in the first place. When I try to use the same command again, it throws an unzip-related error. However, if I specify the second argument as stop_times (which means I'm only unzipping this file and overwriting the others [such as stops, that causes the problem]) I can subsequently run gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip"), apparently because of the cleanup.

I actually have an on.exit() call to unlink the temporary directory gtfsdir created, but when I have parsing failures (i.e. when I catch the warnings) the directory is not removed, even with force = TRUE. I have also tried to manually remove the directory, but I get the message that the file is being used by another program thus the directory cannot be removed.

ps: "não foi possível abrir o arquivo" means "could not open the file"


Also, another problem is that fread seems not to do well with tryCatch. Check this:

gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> Warning message:
#> In read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip") :
#>   Parsing failures while reading the following file(s): trips, stops
gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> Error in utils::unzip(path, files = files_to_read, exdir = temp_dir, overwrite = TRUE) : 
#>   não foi possível abrir o arquivo 'C:/Users/Usuario/AppData/Local/Temp/RtmpqS6Xww/gtfsdir/stops.txt': Invalid argument
gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip", "stop_times")
gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> Warning message:
#> In read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip") :
#>   Parsing failures while reading the following file(s): trips, stops

I actually suppress a possible warning when reading the gtfs due to how the function is structured. If I remove these warning supression you can see that fread does some cleaning after the "invalid argument" error.

gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip")
#> Error in utils::unzip(path, files = files_to_read, exdir = temp_dir, overwrite = TRUE) : 
#>   não foi possível abrir o arquivo 'C:/Users/Usuario/AppData/Local/Temp/RtmpqS6Xww/gtfsdir/stops.txt': Invalid argument
gtfs <- read_gtfs("../msc-thesis/otp/graphs/rio/ola.zip", "stop_times")
#> Warning message:
#> In data.table::fread(file.path(temp_dir, file), nrows = 1) :
#>   Previous fread() session was not cleaned up properly. Cleaned up ok at the beginning of this fread() call.

Related data.table issue: https://github.com/Rdatatable/data.table/issues/2904

zoushucai commented 3 years ago

I have also encountered this problem recently. Do you have any good solutions? I need to read multiple CSV files in turn. Some files need to be separated by other symbols according to the warning information instead of using the default double quotation marks

Thank you

dhersz commented 3 years ago

Hi @zoushucai, I started using withCallingHandlers() instead of tryCatch(), because it doesn't interrupt the running process. Perhaps it could be well suited for your needs as well.

zoushucai commented 3 years ago

Thank you for your suggestion. My problem has been solved @dhersz

ldmax commented 2 years ago

That's a neat idea! Would be adding tryCatch(..., finally=.Call(CfreadCleanup)) around the .Call(CfreadR) here I guess : https://github.com/Rdatatable/data.table/blob/master/R/fread.R#L101

Hi @mattdowle, could you please kindly point out how should I use

    ...
    finally=freadCleanup
    ...

or

    ...
    finally=.Call(CfreadCleanup)
    ...

Because every time I try above I'll get a 'C symbol name "freadCleanup" not in load table' error. Thank you in advance! image