eddelbuettel / digest

R package to create compact hash digests of R objects
https://eddelbuettel.github.io/digest
111 stars 47 forks source link

digest not work for internet drive #181

Closed kongdd closed 1 year ago

kongdd commented 1 year ago
library(magrittr)
library(fs)

# `devtools::document` used the function `fs::path_real`
f = fs::path_real("Z:/Researches/ET_evaluation") %>% paste0("/NAMESPACE")
print(f)
#> [1] "//kong-nas/CMIP6/Researches/ET_evaluation/NAMESPACE"
readLines(f) %>% head()
#> [1] "# Generated by roxygen2: do not edit by hand"
#> [2] ""                                            
#> [3] "S3method(as.Date,PCICt)"                     
#> [4] "S3method(as.data.table,SpatRaster)"          
#> [5] "export(as.SpatRaster.SpatialPixelsDataFrame)"
#> [6] "export(as_SpatialPixelsDataFrame)"

digest::digest(file = f)
#> Error: The specified file is not readable: //kong-nas/CMIP6/Researches/ET_evaluation/NAMESPACE

Created on 2022-11-16 with reprex v2.0.2

Just wondering whether this output is normal? Any idea how to solve?

This issue was initially reported at here, https://github.com/r-lib/roxygen2/issues/1443.

eddelbuettel commented 1 year ago

No idea, and this is not a minimally reproducible example. Please try it removing magrittr (the base pipe |> should work fine) and fs (the function file.path() works fine).

After that, it may just be your local permissions. There is strictly nothing digest (or any other R package) can do to alter how you are creating a file under Windows.

kongdd commented 1 year ago

Sorry for the disturb. Indeed, it is hard to reproduce. Bug occurs only for internet drive.

But I can confirm that it is not problem of file permission. As shown in the above script, readLines works, as well as the tools::md5sum, cli::hash_file_md5 and rlang::hash_file (see details in https://github.com/r-lib/roxygen2/issues/1443). But digest failed. Hence, I believe it is the issue of digest.

kongdd commented 1 year ago

As problem has been solved in https://github.com/r-lib/roxygen2/issues/1443, I close this issue.

eddelbuettel commented 1 year ago

Based on over 30 years of using Unix, I doubt that. digest, as an R package, does not reinvent or re-supply file system operations. You have the package sources to check that. It reads files just like any other R package: my asking the operating to give them to it. And I don't think we add lockfiles or anything fancy.

eddelbuettel commented 1 year ago

That was a moderately poor bug report of yours. Skimming the issue over there does not in way implicate digest (and cannot, see my previous comment). Gabor simply mentions that different functions supply md5sum.

kongdd commented 1 year ago

I have no idea what happened to digest. But the reality is others work, while digest not. If this report does not help, please ignore it.

kongdd commented 1 year ago

Same issue reported 6 months ago https://stackoverflow.com/questions/67266275/r-package-namespace-file-is-not-readable.

"I did move the package from my computer to my workplace's server"

We have the same situation, the internet drive.

eddelbuettel commented 1 year ago

It is always possible that something is in fact wrong but I would need your help. The help of you, and nothing but base::readLines(), tools::md5sum() and digest::digest().

On my system, picking a file known to exist in every R installation and accessible programmatically:

> R.home()
[1] "/usr/lib/R"
> R.home("COPYING")
[1] "/usr/lib/R/COPYING"
> myfile <- R.home("COPYING")
> length(readLines(myfile))
[1] 339
> tools::md5sum(myfile)
                /usr/lib/R/COPYING 
"b234ee4d69f5fce4486a80fdaf4a4263" 
> digest::digest(file=myfile)
[1] "b234ee4d69f5fce4486a80fdaf4a4263"
> 

Please do that at your end, and then replace myfile with something you create on the network drive. How you create it may matter as the reading should not.

Also note that pointing at NAMESPACE (when created by roxygen2 or other helpers) has nothing to do with digest.

kongdd commented 1 year ago
setwd("z:") # z is my internet drive

## 01 exmaple: works well on local drive
R.home()
#> [1] "C:/PROGRA~1/R/R-42~1.2"
R.home("COPYING")
#> [1] "C:/PROGRA~1/R/R-42~1.2/COPYING"
f <- R.home("COPYING")
length(readLines(f))
#> [1] 340
tools::md5sum(f)
#>     C:/PROGRA~1/R/R-42~1.2/COPYING 
#> "eb723b61539feef013de476e68b5c50a"
digest::digest(file=f)
#> [1] "eb723b61539feef013de476e68b5c50a"

## 02 example: failed on internet drive
# create a simple file
f = normalizePath("a.txt")
d = data.frame(x = 1:10)
write.table(d, f)

print(f)
#> [1] "Z:\\Researches\\ET_evaluation\\a.txt"
print(normalizePath(f))
#> [1] "Z:\\Researches\\ET_evaluation\\a.txt"
print(fs::path_real(f))
#> //kong-nas/CMIP6/Researches/ET_evaluation/a.txt

# test readable
length(readLines(f))
#> [1] 11
tools::md5sum(f)
#> Z:\\Researches\\ET_evaluation\\a.txt 
#>   "f0715a2822c23d66958ebd738e515ace"

# trace(digest::digest, edit = TRUE)
digest::digest(file=f)
#> Error: The specified file is not readable: Z:\Researches\ET_evaluation\a.txt
digest::digest(file=basename(f))
#> Error: The specified file is not readable: a.txt
digest::digest(file=fs::path_real(f))
#> Error: The specified file is not readable: //kong-nas/CMIP6/Researches/ET_evaluation/a.txt

Created on 2022-11-16 with reprex v2.0.2

If this issue do exist, please help to remove the not useful vote in my Stack Overflow answer. I am also for the good of other users, who faced with the same issue.

eddelbuettel commented 1 year ago

Now that does look like a reproducible bug!! Can you print(f) as well and maybe go into the digest source just before the file is opened, for example in this block:

https://github.com/eddelbuettel/digest/blob/565c960c619e2d6c80dae42c0d47605546ad9eb3/R/digest.R#L95-L100

The intention of the path.expand() may have been to protect. We already special-case Windows here and can likely protect the path is this is needed. Can you check (maybe by adding print() or cat() statements where this ends poorly, i.e. if the code flows past a) path.expand() and b) check_file() so that it is c) the digest_impl() call in C.

kongdd commented 1 year ago

Sorry, I have touble in controlling my office PC remotely. It is 11pm in our local time. I will continue tomorrow.

eddelbuettel commented 1 year ago

Ok, let's continue tomorrow. We ended up in a more constructive spot here.

eddelbuettel commented 1 year ago

In the above though you claim:

f = "a.txt"

in on an internet drive. How did you get to the internet drive? Can you show me the setwd() command and the full path?

kongdd commented 1 year ago

Updated. See details at https://github.com/eddelbuettel/digest/issues/181#issuecomment-1317134887

eddelbuettel commented 1 year ago

Thanks. That is weird. You and I may have to even go to simple C(++) examples (Rcpp helps) to see where it helps. Basically the idea of function check_file() is to catch these cases and alert the user. Something on your network drive is demonstrably different from what lots of other user experienced (with digest now being 20 years old and widely used).

eddelbuettel commented 1 year ago

Try digest:: digest(file=normalizePath(f)). If f is a.txt but the actual path is different as shown then you get an access error if you try to work with the 'different and non-existing path'. That is more akin to s3 and buckets vs full paths.

kongdd commented 1 year ago
r$> digest::digest(file = f)
Error: The specified file is not readable: a.txt
r$> traceback()
4: stop(txt, obj, call. = FALSE)
3: .errorhandler("The specified file is not readable: ", object,
       mode = errormode)
2: check_file(object, errormode)
1: digest::digest(file = f)

The error is thrown by check_file.

eddelbuettel commented 1 year ago

digest(file=f) reporting 'a.txt is not a file' is correct given what you showed about f being "a.txt" and normalizedPath(f) being something entirely different.

Just see what I wrote earlier in preceding comment and try digest(file=normalizePath(f)).

Also check_file() is a simple helper function defined in the package. Try the three tests therein, for f and for normalizedPath(f). It simply looks like your filesystem is non-standard and needs help. Which and where we have not been able to tell as your replies were not focussed enough.

kongdd commented 1 year ago

See details at https://github.com/eddelbuettel/digest/issues/181#issuecomment-1317134887

image

eddelbuettel commented 1 year ago

Ok. This is now between you and your admins. If R tells you

file is not readable

then there is little I can do. Ball in your court.

kongdd commented 1 year ago

I agree to leave this issue alone. Thanks for your time.

eddelbuettel commented 1 year ago

Also, to be plain, I asked you to run the subcommands inside of check_file(). If you can't do then we are done too.

https://github.com/eddelbuettel/digest/blob/565c960c619e2d6c80dae42c0d47605546ad9eb3/R/digest.R#L167-L179

kongdd commented 1 year ago

image

file.access return wrong result.

As my issue has been solved, I don't want to waste time investigating why file.access returns wrong results. It beyond my aim.

eddelbuettel commented 1 year ago

For the love of all that is sacred do NOT POST IMAGES. They are unuseable in subsequent analysis.

I think you are not quite sure about what I am asking you, and you may be less experienced debugging or analysing this -- but as I do not have access to your filesystem there is little more I can do.

We both spent an hour on this by now we never get back. Let's stop here. digest is open source. I encourage you to work out a modification that helps you in your circumstance, and contribute it back.

(a.txt does not exist. But you apparent;y refuse to run the analysis with the seemingly equivalent "Z:\\Researches\\ET_evaluation\\a.txt". So be it. Let's stop here.)

kongdd commented 1 year ago

This issue should be root in file.access, which is also reported before:

https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/gnuwin32/CHANGES0#L35-L38

https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/utils/R/packages2.R#L263-L272

We are on the wrong way to debug. What we can do is not continuously run the above test, should be avoid using file.access under the windows system.

eddelbuettel commented 1 year ago

No.

You just have to trust me on this.

I released digest twenty years ago. The comments you just quoted are from years ago when Windows filesystems were bad. We eventually fixed that. With tests such as those in check_file().

digest has been downloaded and used millions of times and works on known filesystems. You, and so far only you, insists on working on another type of file system which behaves differently. Now, as you showed via fs and other package there are apparently workarounds but I have been unable to get you to be of any real help. Unfortunately I also look after a number of other projects and those request my attention too. I will stop this here and lock the issue.