gesistsa / grafzahl

🧛 fine-tuning Transformers for text data from within R
https://gesistsa.github.io/grafzahl/
GNU General Public License v3.0
41 stars 2 forks source link

Make Grafzahl look for conda in the right place #24

Closed ureber closed 1 year ago

ureber commented 1 year ago

Hi there,

I'm having a bit of trouble getting Grafzahl to look for miniconda in the right place. Since I had to install miniconda manually, it's not in the usual folder (r-miniconda), but some other (miniconda3). I saw that this was an issue before (#20) and tried to solve it by specifying the right path for both RETICULATE_MINICONDA_PATH and GRAFZAHL_MINICONDA_PATH. However, detect_conda() still returns FALSE.

The problem, I suspect, is that the .gen_conda_path function adds "bin" and "conda" to the path, which then points to a folder/file that doesn't exist within miniconda3. In my case, the right path would be "condabin" and then "conda" (I guess). I don't know if this is a version or system related issue, but any idea on how to fix this or some workaround would be greatly appreciated.

Thanks!

chainsawriot commented 1 year ago

sessionInfo(), please.

Also please tell me the output of

reticulate::conda_list()
ureber commented 1 year ago

Sure!

sessionInfo()

R version 4.3.1 (2023-06-16 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server 2019 x64 (build 17763)

Matrix products: default

locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

time zone: Europe/Zurich tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] Matrix_1.5-4.1 glmnet_4.1-7 gtable_0.3.3 jsonlite_1.8.7 compiler_4.3.1
[6] Rcpp_1.0.11 grafzahl_0.0.8 assertthat_0.2.1 gower_1.0.1 splines_4.3.1
[11] scales_1.2.1 png_0.1-8 reticulate_1.30 lattice_0.21-8 ggplot2_3.4.2
[16] R6_2.5.1 shape_1.4.6 iterators_1.0.14 tibble_3.2.1 munsell_0.5.0
[21] lime_0.5.3 pillar_1.9.0 rlang_1.1.1 utf8_1.2.3 stringi_1.7.12
[26] fs_1.6.3 cli_3.6.1 withr_2.5.0 magrittr_2.0.3 foreach_1.5.2
[31] grid_4.3.1 rstudioapi_0.15.0 rappdirs_0.3.3 lifecycle_1.0.3 vctrs_0.6.3
[36] glue_1.6.2 codetools_0.2-19 survival_3.5-5 fansi_1.0.4 colorspace_2.1-0 [41] purrr_1.0.1 tools_4.3.1 usethis_2.2.2 pkgconfig_2.0.3

reticulate::conda_list() when RETICULATE_MINICONDA_PATH=C:/Users/ur21m712/AppData/Local/miniconda3/

Error in system2(conda, c("info", "--base"), stdout = TRUE) : '"C:/Users/ur21m712/AppData/Local/miniconda3"' not found In addition: Warning messages: 1: In conda_binary(conda) : Supplied path is not a conda binary: ‘C:/Users/ur21m712/AppData/Local/miniconda3’ 2: In conda_binary(conda) : Supplied path is not a conda binary: ‘C:/Users/ur21m712/AppData/Local/miniconda3’

reticulate::conda_list() when RETICULATE_MINICONDA_PATH=C:/Users/ur21m712/AppData/Local/miniconda3/bin/conda 🤔

                name

1 base 2 grafzahl_condaenv_cuda python 1 C:\Users\ur21m712\AppData\Local\miniconda3/python.exe 2 C:\Users\ur21m712\AppData\Local\miniconda3\envs\grafzahl_condaenv_cuda/python.exe

ureber commented 1 year ago

No, wait! Don't no what made the difference, but now I consistently get the correct (?) output for the option RETICULATE_MINICONDA_PATH=C:/Users/ur21m712/AppData/Local/miniconda3/

                name

1 base 2 grafzahl_condaenv_cuda python 1 C:\Users\ur21m712\AppData\Local\miniconda3/python.exe 2 C:\Users\ur21m712\AppData\Local\miniconda3\envs\grafzahl_condaenv_cuda/python.exe

The slashes are all mixed up. Maybe that's an issue?

chainsawriot commented 1 year ago

@ureber Could you please try this in your command line environment (cmd.exe)?

C:\Users\ur21m712\AppData\Local\miniconda3\envs\grafzahl_condaenv_cuda\python.exe

Do you get a Python console?


Short explanation: I don't have Windows to test this package. Therefore, the support is not as great. For example, I don't know the directory structure of a miniconda install on Windows is different from other *nix systems.

detect_conda() is not a very robust test and might need to modify for Windows.

ureber commented 1 year ago

Yes, I get a Python console (3.10.12).

I'm afraid I can't be of much help here, as my experience with conda on Windows is limited to this server (which is not mine).

chainsawriot commented 1 year ago
.is_windows <- function() {
    Sys.info()[['sysname']] == "Windows"
}

.gen_conda_path <- function(envvar = "GRAFZAHL_MINICONDA_PATH", bin = FALSE) {
    if (Sys.getenv(envvar) == "") {
        main_path <- reticulate::miniconda_path()
    } else {
        main_path <- Sys.getenv(envvar)
    }
    if (isFALSE(bin)) {
        return(main_path)
    }
    if (.is_windows()) {
        return(file.path(main_path, "Scripts", "conda.exe"))
    }
    file.path(main_path, "bin", "conda")
}

## list all conda envs, but restrict to .gen_conda_path
## Should err somehow
.list_condaenvs <- function() {
    all_condaenvs <- reticulate::conda_list(conda = .gen_conda_path(bin = TRUE))
    if (.is_windows()) {
        return(all_condaenvs$name)
    }
    all_condaenvs[grepl(.gen_conda_path(), all_condaenvs$python),]$name
}

.have_conda <- function() {
    ## !is.null(tryCatch(reticulate::conda_list(), error = function(e) NULL))
    ## Not a very robust test, but take it.
    file.exists(.gen_conda_path(bin = TRUE))
}

#' @rdname detect_cuda
#' @export
detect_conda <- function() {
    if(!.have_conda()) {
        return(FALSE)
    }
    envnames <- grep("^grafzahl_condaenv", .list_condaenvs(), value = TRUE)
    length(envnames) != 0
}

.gen_envname <- function(cuda = TRUE) {
    envname <- "grafzahl_condaenv"
    if (cuda) {
        envname <- paste0(envname, "_cuda")
    }
    return(envname)
}

.initialize_conda <- function(envname, verbose = FALSE) {
    if (is.null(getOption('python_init'))) {
        if (.is_windows()) {
            python_executable <- file.path(.gen_conda_path(), "envs", envname, "python.exe")
        } else {
            python_executable <- file.path(.gen_conda_path(), "envs", envname, "bin", "python")
        }
        ## Until rstydio/reticulate#1308 is fixed; mask it for now
        Sys.setenv(RETICULATE_MINICONDA_PATH = .gen_conda_path())
        reticulate::use_miniconda(python_executable, required = TRUE)
        options('python_init' = TRUE)
        if (verbose) {
            message("Conda environment ", envname, " is initialized.")
        }
    }
    return(invisible(NULL))
}

#' Detecting Miniconda And Cuda
#'
#' These functions detects miniconda and cuda.
#'
#' `detect_conda` conducts a test to check whether 1) a miniconda installation and 2) the grafzahl miniconda environment exist.
#' 
#' `detect_cuda` checks whether cuda is available. If `setup_grafzahl` was executed with `cuda` being `FALSE`, this function will return `FALSE`. Even if `setup_grafzahl` was executed with `cuda` being `TRUE` but with any factor that can't enable cuda (e.g. no Nvidia GPU, the environment was incorrectly created), this function will also return `FALSE`.
#' @return boolean, whether the system is available.
#' @export
detect_cuda <- function() {
    options('python_init' = NULL)
    if (Sys.getenv("KILL_SWITCH") == "KILL") {
        return(NA)
    }
    envnames <- grep("^grafzahl_condaenv", .list_condaenvs(), value = TRUE)
    if (length(envnames) == 0) {
        stop("No conda environment found. Run `setup_grafzahl` to bootstrap one.")
    }
    if ("grafzahl_condaenv_cuda" %in% envnames) {
        envname <- "grafzahl_condaenv_cuda"
    } else {
        envname <- "grafzahl_condaenv"
    }
    .initialize_conda(envname = envname, verbose = FALSE)
    reticulate::source_python(system.file("python", "st.py", package = "grafzahl"))
    return(py_detect_cuda())
}

.install_gpu_pytorch <- function(cuda_version) {
    .initialize_conda(.gen_envname(cuda = TRUE))
    conda_executable <- .gen_conda_path(bin = TRUE)
    status <- system2(conda_executable, args = c("install", "-n", .gen_envname(cuda = TRUE), "pytorch", "pytorch-cuda", paste0("cudatoolkit=", cuda_version), "-c", "pytorch", "-c", "nvidia", "-y"))
    if (status != 0) {
        stop("Cannot set up `pytorch`.")
    }    
    python_executable <- reticulate::py_config()$python
    status <- system2(python_executable, args = c("-m", "pip", "install", "simpletransformers"))
    if (status != 0) {
        stop("Cannot set up `simpletransformers`.")
    }    
}

#' Setup grafzahl
#'
#' Install a self-contained miniconda environment with all Python components (PyTorch, Transformers, Simpletransformers, etc) which grafzahl required. The default location is "~/.local/share/r-miniconda/envs/grafzahl_condaenv" (suffix "_cuda" is added if `cuda` is `TRUE`).
#' On Linux or Mac and if miniconda is not found, this function will also install miniconda. The path can be changed by the environment variable `GRAFZAHL_MINICONDA_PATH`
#' @param cuda logical, if `TRUE`, indicate whether a CUDA-enabled environment is wanted.
#' @param force logical, if `TRUE`, delete previous environment (if exists) and create a new environment
#' @param cuda_version character, indicate CUDA version, ignore if `cuda` is `FALSE`
#' @examples
#' # setup an environment with cuda enabled.
#' if (detect_conda() && interactive()) {
#'     setup_grafzahl(cuda = TRUE)
#' }
#' @return TRUE (invisibly) if installation is successful.
#' @export
setup_grafzahl <- function(cuda = FALSE, force = FALSE, cuda_version = "11.3") {
    envname <- .gen_envname(cuda = cuda)
    if (!.have_conda()) {
        if (!force) {
            message("No conda was found in ", .gen_conda_path())
            ans <- utils::menu(c("No", "Yes"), title = paste0("Do you want to install miniconda in ", .gen_conda_path()))
            if (ans == 1) {
                stop("Setup aborted.\n")
            }
        }
        reticulate::install_miniconda(.gen_conda_path(bin = FALSE), update = TRUE, force = TRUE)
    }
    allenvs <- .list_condaenvs()
    if (envname %in% allenvs && !force) {
        stop(paste0("Conda environment ", envname, " already exists.\nForce reinstallation by setting `force` to `TRUE`.\n"))
    }
    if (envname %in% allenvs && force) {
        reticulate::conda_remove(envname = envname, conda = .gen_conda_path(bin = TRUE))
    }    
    ## The actual installation
    ## https://github.com/rstudio/reticulate/issues/779
    ##conda_executable <- file.path(.gen_conda_path(), "bin/conda")
    if (isTRUE(cuda)) {
        yml_file <- "grafzahl_gpu.yml"
    } else {
        yml_file <- "grafzahl.yml"
    }
    status <- system2(.gen_conda_path(bin = TRUE), args = c("env", "create",  paste0("-f=", system.file(yml_file, package = 'grafzahl')), "-n", envname))
    if (status != 0) {
        stop("Cannot set up the basic conda environment.")
    }
    if (isTRUE(cuda)) {
        .install_gpu_pytorch(cuda_version = cuda_version)
    }
    ## Post-setup checks
    if (!detect_conda()) {
        stop("Conda can't be detected.")
    }
    if (detect_cuda() != cuda) {
        stop("Cuda wasn't configurated correctly.")
    }
    return(invisible())
}

YAML

name: grafzahl
channels:
  - pytorch
  - conda-forge
  - anaconda
  - defaults
dependencies:
  - python=3.10
  - pip
  - pytorch>=1.6+cpuonly
  - pip:
    - pandas
    - tqdm  
    - simpletransformers
    - emoji==0.6.0
    - transformers==4.30.2
    - scipy==1.10.1
chainsawriot commented 1 year ago

@ureber Could you please give this a try (The GPU support for Windows is still WIP)?

remotes::install_github("chainsawriot/grafzahl@windows")

setup_grafzahl(cuda = FALSE)
detect_conda()
detect_cuda() # FALSE
model <- grafzahl(unciviltweets, model_type = "bertweet", model_name = "vinai/bertweet-base")
chainsawriot commented 1 year ago

If you don't like to setup the conda environment afresh.

remotes::install_github("chainsawriot/grafzahl@windows")
require(grafzahl)
Sys.setenv("GRAFZAHL_MINICONDA_PATH" = "C:\Users\ur21m712\AppData\Local\miniconda3")
detect_conda()
detect_cuda()
model <- grafzahl(unciviltweets, model_type = "bertweet", model_name = "vinai/bertweet-base")
ureber commented 1 year ago

Thanks for the quick support. I tried both options. First the latter one, without a new installation of conda. As before, detect_conda() returned FALSE. I then ran the setup again, which produced the following error. Running detect_conda() led to TRUE, however.

Error in py_run_file_impl(file, local, convert) : ModuleNotFoundError: No module named 'transformers.models.mmbt'

chainsawriot commented 1 year ago

And yes, I found this issue while fixing this: ThilinaRajapakse/simpletransformers#1539

And the CPU support in the pull request has a temp. fix #25 to this.

In order to fix this in your GPU miniconda install now, with your Conda console

conda activate grafzahl_condaenv_cuda
python -m pip uninstall simpletransformers
python -m pip uninstall transformers
python -m pip uninstall scipy
python -m pip install simpletransformers "transformers==4.30.2" "scipy==1.10.1"
conda deactivate
ureber commented 1 year ago

This made it work! Both detect_conda() and detect_cuda() now return TRUE and also fine-tuning the model with grafzahl() works. I'm not entirely sure which installation it's running on, although judging by the speed, it's probably the GPU one.