HenrikBengtsson / R.utils

🔧 R package: R.utils (this is *not* the utils package that comes with R itself)
https://henrikbengtsson.github.io/R.utils/
62 stars 14 forks source link

is there another way to list all files on Windows? #152

Open olivroy opened 8 months ago

olivroy commented 8 months ago

This code has been around for a very long time.

https://github.com/HenrikBengtsson/R.utils/blob/0382c2e4628f53e5ca9ef821484e28443b87cc2f/R/Sys.readlink2.R#L49-L50

Note that code moved to: https://github.com/HenrikBengtsson/R.utils/blob/2daa30cd8598a917781fa45cccda8ee58dee2c7d/R/Sys.readlink.Windows.R#L32-L40

Would there be another way to do this?

It feels like windows loses encoding.

Doing dir directly in the terminal works and shows output correctly, while when read in R, loses encoding.

I tried many ways to try and fix it, but I can't find th solution.

When I execute this in the shell

  10 Rép(s)  46 547 070 976 octets libres

but in R

 shell("dir", shell=Sys.getenv("COMSPEC"), 
                     mustWork=TRUE, intern=TRUE) 
#> [31] "              10 R\x82p(s)  47\xff277\xff817\xff856 octets libres"

follow-up to https://github.com/HenrikBengtsson/R.cache/issues/52

HenrikBengtsson commented 8 months ago

Thanks for all our troubleshooting around this.

There's actually another reason for revisiting this very old code; it can be extremely slow when there are a lot of files/subfolders in the directory scanned, cf. [R-pkg-devel] Unusually long execution time for R.utils::gzip on r-devel-windows, 2024-02-17 (https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010470.html).

So, yes, this needs to be fixed, but I'm not sure how. I might end up disabling the use of Sys.readlink2() in fileAccess() et al., with an R option to re-enable it.

Before even doing that, I should try to create a unit tests that reproduces the problem you're experiencing using a non-English locale. BTW, what does Sys.getlocale() report on your machine?

olivroy commented 8 months ago
Sys.getlocale()
#> "LC_COLLATE=French_Canada.utf8;LC_CTYPE=French_Canada.utf8;LC_MONETARY=French_Canada.utf8;LC_NUMERIC=C;LC_TIME=French_Canada.utf8"

But I work with R with X64 but somehow runs with windows32?

Sys.getenv("COMSPEC")
"C:\\WINDOWS\\system32\\cmd.exe"
HenrikBengtsson commented 8 months ago

Just making notes for the record here. It's not as easy as setting LANGUAGE before calling R in order to try to reproduce this on an English system. The shell("dir", ...) command still responds with English output, e.g.

C:\Users\hb>set LANGUAGE=fr_FR.utf8
C:\Users\hb>R

R version 4.3.2 (2023-10-31 ucrt) -- "Eye Holes"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R est un logiciel libre livré sans AUCUNE GARANTIE.
Vous pouvez le redistribuer sous certaines conditions.
Tapez 'license()' ou 'licence()' pour plus de détails.

R est un projet collaboratif avec de nombreux contributeurs.
Tapez 'contributors()' pour plus d'information et
'citation()' pour la façon de le citer dans les publications.

Tapez 'demo()' pour des démonstrations, 'help()' pour l'aide
en ligne ou 'help.start()' pour obtenir l'aide au format HTML.
Tapez 'q()' pour quitter R.

> shell("dir")
 Volume in drive C has no label.
 Volume Serial Number is CEF7-2D3F

 Directory of C:\Users\hb

01/11/2023  12:14    <DIR>          .
01/11/2023  12:14    <DIR>          ..
10/01/2023  12:26                70 .bashrc
...

Same with shell("dir", shell=Sys.getenv("COMSPEC")). It could be that one would have to change the language on the whole system, which is probably not reasonable.

UPDATE: This also means we cannot force English output from within R, which otherwise could have solve the problem reported here.

HenrikBengtsson commented 8 months ago

@olivroy , as a first step, I've updated the develop branch so that you can disable the call to shell("dir"). Could you please try that version and see what happens if you set enviroment variable R_R_UTILS_SYS_READLINKS2_WINDOWS=FALSE` before loading R.utils, e.g.

Sys.setenv(R_R_UTILS_SYS_READLINKS2_WINDOWS = "FALSE")
library(R.utils)
...

If it works as intended, you shouldn't get any warnings.

olivroy commented 8 months ago

Hi, thanks for looking into it. Unfortunately it didn't work. I think it is due to a minor typo. Fixed in #154

HenrikBengtsson commented 8 months ago

Oops, I thought I fixed that, but that was another typo. Merged. Does it work for you?

olivroy commented 8 months ago

Seems to work! I wonder if you agree that #155 should be merged too?

My rationale still holds:

  1. There is nothing a user can do about this warning
  2. there are sometimes up to 10 warnings printed to console with a single styler:::style_active_file() call.
  3. It took me a long time to figure out what it is.
  4. The functionality seems to work even if the warning is shown.
  5. Fills up the console for no good reason.
  6. May be difficult to learn about the new env var.