OscarKjell / text

Using Transformers from HuggingFace in R
https://r-text.org
134 stars 30 forks source link

Error while running textrpp_install() #96

Open cenotechnology opened 10 months ago

cenotechnology commented 10 months ago

Hi there, thanks for developing this program; I highly appreciate your contribution.

The installation interrupted and the system shows the following message:

`Building wheel for tokenizers (pyproject.toml): finished with status 'error' error: subprocess-exited-with-error

× Building wheel for tokenizers (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [616 lines of output] running bdist_wheel running build running build_py`

ERROR: Failed building wheel for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects Error: Error installing package(s): "'torch==2.0.0'", "'transformers==4.19.2'", "numpy", "pandas", "'nltk==3.6.7'", "scikit-learn", "'datasets==2.9.0'", "evaluate"

Many thanks for your support.

cenotechnology commented 10 months ago

I should add that I have followed the instruction provided online: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh but the problem persists. Kind regards!

moomoofarm1 commented 10 months ago

Could you run the code and paste the output here? The code to run is reticulate::py_list_packages("textrpp_condaenv").

BTW, please also run the code sessionInfo() and paste the output here.

cenotechnology commented 10 months ago

Many thanks for your prompt response. Here is the outcome for the first code: > reticulate::py_list_packages("textrpp_condaenv") package version requirement channel 1 ca-certificates 2023.11.17 ca-certificates=2023.11.17 conda-forge 2 libcxx 16.0.6 libcxx=16.0.6 conda-forge 3 libffi 3.3 libffi=3.3 conda-forge 4 libsqlite 3.44.2 libsqlite=3.44.2 conda-forge 5 libzlib 1.2.13 libzlib=1.2.13 conda-forge 6 ncurses 6.4 ncurses=6.4 conda-forge 7 openssl 1.1.1w openssl=1.1.1w conda-forge 8 pip 23.3.1 pip=23.3.1 conda-forge 9 python 3.9.0 python=3.9.0 conda-forge 10 readline 8.2 readline=8.2 conda-forge 11 setuptools 68.2.2 setuptools=68.2.2 conda-forge 12 sqlite 3.44.2 sqlite=3.44.2 conda-forge 13 tk 8.6.13 tk=8.6.13 conda-forge 14 tzdata 2023c tzdata=2023c conda-forge 15 wheel 0.42.0 wheel=0.42.0 conda-forge 16 xz 5.2.6 xz=5.2.6 conda-forge 17 zlib 1.2.13 zlib=1.2.13 conda-forge

Here is the outcomes of the second code:

`> sessionInfo() R version 4.3.2 (2023-10-31) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.1.1

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] reticulate_1.34.0 text_1.0

loaded via a namespace (and not attached): [1] gtable_0.3.4 ggplot2_3.4.4 recipes_1.0.8 overlapping_2.1 lattice_0.22-5 vctrs_0.6.4
[7] tools_4.3.2 generics_0.1.3 parallel_4.3.2 tibble_3.2.1 fansi_1.0.5 pkgconfig_2.0.3
[13] Matrix_1.6-3 data.table_1.14.8 lhs_1.1.6 GPfit_1.0-8 lifecycle_1.0.4 compiler_4.3.2
[19] brio_1.1.3 munsell_0.5.0 codetools_0.2-19 DiceDesign_1.9 class_7.3-22 tune_1.1.2
[25] prodlim_2023.08.28 pillar_1.9.0 furrr_0.3.1 tidyr_1.3.0 MASS_7.3-60 gower_1.0.1
[31] yardstick_1.2.0 iterators_1.0.14 foreach_1.5.2 rpart_4.1.21 parallelly_1.36.0 lava_1.7.3
[37] dials_1.2.0 tidyselect_1.2.0 digest_0.6.33 stringi_1.8.2 future_1.33.0 dplyr_1.1.4
[43] purrr_1.0.2 listenv_0.9.0 splines_4.3.2 cowplot_1.1.1 parsnip_1.1.1 grid_4.3.2
[49] colorspace_2.1-0 cli_3.6.1 magrittr_2.0.3 survival_3.5-7 utf8_1.2.4 future.apply_1.11.0 [55] withr_2.5.2 scales_1.2.1 lubridate_1.9.3 timechange_0.2.0 globals_0.16.2 nnet_7.3-19
[61] timeDate_4022.108 png_0.1-8 workflows_1.1.3 testthat_3.2.0 hardhat_1.3.0 rsample_1.2.0
[67] rlang_1.1.2 Rcpp_1.0.11 glue_1.6.2 ipred_0.9-14 rstudioapi_0.15.0 jsonlite_1.8.7
[73] R6_2.5.1`

Once again, thanks for your help.

moomoofarm1 commented 10 months ago

It seems your enviornment is correctly set. You do not need to run the curl command manually. The installation of the text package will automatically run that. So just try to reinstall the text package following the steps below.

  1. First uninstall miniconda by manually deleting the folder got from the code reticulate::miniconda_path() (should be similar to /Users/macID/Library/r-miniconda-arm64) in the Library folder (replace the macID with your mac user name).
  2. Uninstall the text package (if this is installed).
  3. Try install the github version via code install.packages("devtools"), and devtools::install_github("oscarkjell/text").
  4. Follow the steps in the extended installation guide.
cenotechnology commented 10 months ago

Many thanks for your support. But the problem persists. Perhaps, it is a compatibility issue.

moomoofarm1 commented 10 months ago

Have you installed any version of conda before installing the text package? Especially anaconda and miniconda. The incompatibility may be due to the incorrect path of the python package installer.

cenotechnology commented 10 months ago

Thanks Moomoofarm1. I did not have any other version of anaconda or Miniconda on the laptop. I think it has to do with macOS. This morning, I tried to install it on a windows laptop, and it was successful. My macOS version is: 14.1.1 (23B81) (just for reference). Have a great day!

moomoofarm1 commented 10 months ago

Nice to hear your success. If the imcompatibility comes again, please paste the message printed by the code reticulate::py_last_error(). It might help.

cenotechnology commented 10 months ago

Many thanks. I ran it, but got null as output.

'ERROR: Failed building wheel for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects Error: Error installing package(s): "'torch==2.0.0'", "'transformers==4.19.2'", "numpy", "pandas", "'nltk==3.6.7'", "scikit-learn", "'datasets==2.9.0'", "evaluate"

reticulate::py_last_error() NULL reticulate::py_last_error() NULL '

moomoofarm1 commented 10 months ago

Maybe this helps by installing tokenizers directly. reticulate::conda_install(envname="textrpp_condaenv", packages=c("tokenizers==0.13.1"), pip=TRUE)

cenotechnology commented 10 months ago

Finally! many thanks, it works. But, I also had to install the packages "manually". I still had some issues when I tried to install transformers 4.19.2; then I changed it to 4.35.2, and it all worked out. The system has successfully initialised textrpp; and I successfully test textEmbed("hello") . Here are the packages that I have installed "manually": reticulate::conda_install(envname="textrpp_condaenv", packages=c("tokenizers==0.13.1"), pip=TRUE) reticulate::conda_install(envname="textrpp_condaenv", packages=c("torch==2.0.0"), pip=TRUE) reticulate::conda_install(envname="textrpp_condaenv", packages=c("nltk==3.6.7"), pip=TRUE) reticulate::conda_install(envname="textrpp_condaenv", packages=c("datasets==2.9.0"), pip=TRUE) reticulate::conda_install(envname="textrpp_condaenv", packages=c("transformers==4.35.2"), pip=TRUE)

Once again many thanks for your support and patience!

moomoofarm1 commented 10 months ago

I will update the code to automate the installation if necessary in the future.

MatthBogaert commented 8 months ago

FYI, I tried to install the package with my macbook with M2 processor with the normal procedure, same issue persisted. I tried the installation of the packages manually like mentioned by OP and this worked. Maybe this could be added to the advanced installation guide for Apple M1/M2 users?