bnosac / udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
https://bnosac.github.io/udpipe/en
Mozilla Public License 2.0
209 stars 33 forks source link

keywords_phrases() returns empty on package example #67

Closed leungi closed 4 years ago

leungi commented 4 years ago

The same code works in Windows, but fails in CentOS.

Checked intermediate inputs - x$phrase_tag, x$token - and they're identical in both machines.

Windows

library(udpipe)
#> Warning: package 'udpipe' was built under R version 3.5.3
data(brussels_reviews_anno, package = "udpipe")
x <- subset(brussels_reviews_anno, language %in% "fr")

## Find noun phrases with the following regular expression: (A|N)+N(P+D*(A|N)*N)*
x$phrase_tag <- as_phrasemachine(x$xpos, type = "penn-treebank")
#> Warning in as_phrasemachine(x$xpos, type = "penn-treebank"): x should
#> contain only these tags: CC, CD, DT, EX, FW, IN, JJ, JJR, JJS, LS, MD,
#> NN, NNP, NNPS, NNS, PDT, POS, PRP, PRP$, RB, RBR, RBS, RP, SYM, TO, UH,
#> VB, VBD, VBG, VBN, VBP, VBZ, WDT, WP, WP$, WRB, the following are not
#> recognised tags: - , . : ( )
nounphrases <- keywords_phrases(x$phrase_tag, term = x$token, 
                                pattern = "(A|N)+N(P+D*(A|N)*N)*", is_regex = TRUE, 
                                ngram_max = 4, 
                                detailed = TRUE)
head(nounphrases, 10)
#>                     keyword ngram pattern start end
#> 1            excellent week     2      AN     2   3
#> 2                cett Jolie     2      NN    34  35
#> 3          cett Jolie ville     3     NNN    34  36
#> 4               Jolie ville     2      NN    35  36
#> 5  Jolie ville de Bruxelles     4    NNPN    35  38
#> 6           jolies quartier     2      AN    57  58
#> 7              super sejour     2      NN    66  67
#> 8               beau soleil     2      AN    70  71
#> 9      beau soleil en prime     4    ANPN    70  73
#> 10             centre ville     2      NN    89  90

Created on 2019-10-15 by the reprex package (v0.2.1)

CentOS

> library(udpipe)
>
> data(brussels_reviews_anno, package = "udpipe")
x <- subset(brussels_reviews_anno, language %in% "fr")
x$phrase_tag <- as_phrasemachine(x$xpos, type = "penn-treebank")
Warning message:
In as_phrasemachine(x$xpos, type = "penn-treebank") :
  x should contain only these tags: CC, CD, DT, EX, FW, IN, JJ, JJR, JJS, LS, MD, NN, NNP, NNPS, NNS, PDT, POS, PRP, PRP$, RB, RBR, RBS, RP, SYM, TO, UH, VB, VBD, VBG, VBN, VBP, VBZ, WDT, WP, WP$, WRB, the following are not recognised tags: - , . : ( )
> nounphrases <- keywords_phrases(x$phrase_tag, term = x$token,
+                                 pattern = "(A|N)+N(P+D*(A|N)*N)*", is_regex = TRUE,
+                                 ngram_max = 4,
+                                 detailed = TRUE)
> head(nounphrases, 10)
[1] keyword ngram   pattern start   end
<0 rows> (or 0-length row.names)

Machine Info

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.3 (Maipo)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] data.table_1.12.2    RODBC_1.3-15         ROracle_1.3-1
 [4] DBI_1.0.0            packcircles_0.3.3    emojifont_0.5.2
 [7] shinydashboard_0.7.1 shiny_1.3.2          plotly_4.9.0
[10] waiter_0.0.4         DT_0.6               udpipe_0.8.1
[13] kableExtra_1.1.0     formattable_0.2.0.1  rmarkdown_1.14
[16] tictoc_1.0           janitor_1.2.0        readxl_1.3.1
[19] lubridate_1.7.4      forcats_0.4.0        stringr_1.4.0
[22] dplyr_0.8.0.1        purrr_0.3.2          readr_1.3.1
[25] tidyr_0.8.3          tibble_2.1.1         ggplot2_3.1.1
[28] tidyverse_1.2.1
jwijffels commented 4 years ago

Same question as here: https://github.com/bnosac/udpipe/issues/20 keywords_phrases uses C++11 <regex>. <regex> was implemented and released in GCC 4.9.0. So you need to make sure you have at least gcc 4.9.0 for this to work and once you have installed gcc >= 4.9.0 install the udpipe package again. You can see which version of gcc you have on centos with gcc --version

See also https://stackoverflow.com/questions/12530406/is-gcc-4-8-or-earlier-buggy-about-regular-expressions

leungi commented 4 years ago

Thanks for prompt reply and pointers @jwijffels.

A new issue after upgrading gcc and g++ - unable to re-install udpipe as you suggested in #20.

[leungi@ohylpyt1-d ~]$ gcc --version
gcc (GCC) 4.9.3
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[leungi@ohylpyt1-d ~]$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/local/gcc493/libexec/gcc/x86_64-unknown-linux-gnu/4.9.3/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.9.3/configure --prefix=/usr/local/gcc493 --program-suffix=49 --enable-languages=c,c++ --disable-libstdcxx-pch --disable-multilib
Thread model: posix
gcc version 4.9.3 (GCC)
[leungi@ohylpyt1-d ~]$ sudo R
[sudo] password for leungi:
Account with conflicting name (leungi) exists locally

R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> install.packages("udpipe")
* installing *source* package âudpipeâ ...
** package âudpipeâ successfully unpacked and MD5 sums checked
** using staged installation
** libs
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c RcppExports.cpp -o RcppExports.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_phrases.cpp -o rcpp_phrases.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_udpipe.cpp -o rcpp_udpipe.o
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c udpipe.cpp -o udpipe.o
udpipe.cpp: In static member function âstatic bool ufal::udpipe::morphodita::gru_tokenizer_trainer::train(unsigned int, unsigned int, bool, unsigned int, unsigned int, unsigned int, float, float, float, float, bool, const std::vector<ufal::udpipe::morphodita::tokenized_sentence>&, const std::vector<ufal::udpipe::morphodita::tokenized_sentence>&, std::ostream&, std::string&)â:
udpipe.cpp:13510:80: error: âto_stringâ was not declared in this scope
     return error.assign("Gru tokenizer dimension '").append(to_string(dimension)).append("' is not supported!"), false;
                                                                                ^
udpipe.cpp: In static member function âstatic void ufal::udpipe::parsito::parser_nn_trainer::train(const string&, const string&, bool, const string&, const string&, const ufal::udpipe::parsito::network_parameters&, unsigned int, const std::vector<ufal::udpipe::parsito::tree>&, const std::vector<ufal::udpipe::parsito::tree>&, ufal::udpipe::utils::binary_encoder&)â:
udpipe.cpp:16453:73: error: âto_stringâ was not declared in this scope
         embeddings_from_file_comment = "[dim" + to_string(file_dimension) + "->" + to_string(dimension) + "]";
                                                                         ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::parsito::tree_input_format_conllu::next_tree(ufal::udpipe::parsito::tree&)â:
udpipe.cpp:17913:66: error: âto_stringâ was not declared in this scope
         return error.assign("Node ID '").append(to_string(node.id)).append("' form '").append(node.form).append("' has too large head: '").append(to_string(node.head)).append("'!"), false;
                                                                  ^
udpipe.cpp: In member function âvirtual void ufal::udpipe::parsito::tree_output_format_conllu::write_tree(const ufal::udpipe::parsito::tree&, std::string&, const ufal::udpipe::parsito::tree_input_format*) constâ:
udpipe.cpp:17947:30: error: âto_stringâ was not declared in this scope
     output.append(to_string(i)).push_back('\t');
                              ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_conllu::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18218:66: error: âto_stringâ was not declared in this scope
         return error.assign("Node ID '").append(to_string(word.id)).append("' form '").append(word.form).append("' has too large head: '").append(to_string(word.head)).append("'!"), false;
                                                                  ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_horizontal::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18314:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_vertical::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18404:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_presegmented_tokenizer::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18518:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvoid ufal::udpipe::token::set_token_range(size_t, size_t)â:
udpipe.cpp:19369:58: error: âto_stringâ was not declared in this scope
     start_misc_field("TokenRange").append(to_string(start)).append(1, ':').append(to_string(end));
                                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::morphodita_tokenizer_wrapper::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:19793:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In static member function âstatic bool ufal::udpipe::trainer_morphodita_parsito::train_tagger(const std::vector<ufal::udpipe::sentence>&, const std::vector<ufal::udpipe::sentence>&, const string&, std::ostream&, std::string&)â:
udpipe.cpp:21133:61: error: âto_stringâ was not declared in this scope
         model_name = "from_model_" + to_string(++model_index);
                                                             ^
udpipe.cpp: In static member function âstatic bool ufal::udpipe::trainer_morphodita_parsito::train_parser(const std::vector<ufal::udpipe::sentence>&, const std::vector<ufal::udpipe::sentence>&, const string&, const string&, std::ostream&, std::string&)â:
udpipe.cpp:21194:100: error: âto_stringâ was not declared in this scope
       if (embedding_upostag) embeddings.append("universal_tag ").append(to_string(embedding_upostag)).append(" 1\n");
                                                                                                    ^
udpipe.cpp:21195:88: error: âto_stringâ was not declared in this scope
       if (embedding_feats) embeddings.append("feats ").append(to_string(embedding_feats)).append(" 1\n");
                                                                                        ^
udpipe.cpp:21196:90: error: âto_stringâ was not declared in this scope
       if (embedding_xpostag) embeddings.append("tag ").append(to_string(embedding_xpostag)).append(" 1\n");
                                                                                          ^
udpipe.cpp:21198:67: error: âto_stringâ was not declared in this scope
         embeddings.append("form ").append(to_string(embedding_form)).append(" ").append(to_string(embedding_form_mincount));
                                                                   ^
udpipe.cpp:21203:69: error: âto_stringâ was not declared in this scope
         embeddings.append("lemma ").append(to_string(embedding_lemma)).append(" ").append(to_string(embedding_lemma_mincount));
                                                                     ^
udpipe.cpp:21207:91: error: âto_stringâ was not declared in this scope
       if (embedding_deprel) embeddings.append("deprel ").append(to_string(embedding_deprel)).append(" 1\n");
                                                                                           ^
make: *** [udpipe.o] Error 1
ERROR: compilation failed for package âudpipeâ
* removing â/usr/lib64/R/library/udpipeâ
* restoring previous â/usr/lib64/R/library/udpipeâ

The downloaded source packages are in
        â/tmp/Rtmpp6V9ER/downloaded_packagesâ
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning message:
In install.packages("udpipe") :
  installation of package âudpipeâ had non-zero exit status
jwijffels commented 4 years ago

That looks like you have not compiled the package with C++11 support. Your log does not show g++ -std=c++11 but shows g++ -std=gnu++11. Maybe because you have 2 compilers in your system and R knows only of the previous compiler (PKG_CXXFLAGS is probably pointing to your old compiler)

leungi commented 4 years ago

Thanks again for prompt reply @jwijffels.

I tried a few ideas from SO, but to no avail.

May you please guide how to point to desired compiler? 🙏

# trial 1
Sys.setenv("PKG_CXXFLAGS"="-g -std=c++11")

# trial 2
Sys.setenv("PKG_CXXFLAGS"="-std=c++11")

# trial 3 
# adapted from {rstan} install
file.remove(M)
dotR <- file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR)) dir.create(dotR)
M <- file.path(dotR, "Makevars")
file.remove(M)
if (!file.exists(M)) file.create(M)
cat("\nCXX14FLAGS=-O3 -march=native -mtune=native -fPIC",
    "CXX14=g++ -std=c++11",
    "CXX14FLAGS+= -std=c++11",
    "CXX14STD='-std=c++11'", # or clang++ but you may need a version postfix
    file = M, sep = "\n", append = TRUE)
jwijffels commented 4 years ago

Providing support in installing a recent compiler on Red Hat is a bit out-of-scope for this github issue. As you prefer to have 2 C++ compilers on your environment (/usr/local/gcc493 and your default location of g++), you will have to tell which one to use when you install udpipe. What you can do is set the right directives in ~/.R/Makevars. You can find out about these at https://stackoverflow.com/questions/43597632/understanding-the-contents-of-the-makevars-file-in-r-macros-variables-r-ma

leungi commented 4 years ago

Appreciate the pointers 👍

Keeping the 2 compilers wasn't my intention; I wished the older version was overwritten to avoid such headache!

Will close the loop once I confirmed upgrading udpipe with gcc-4.9.0 solves the issue.

leungi commented 4 years ago

@jwijffels: useful SO guide that was 👍

Followed the guide and managed to set compiler of choice, but still compile error:

[leungi@ohylpyt1-d]$ which g++
/bin/g++
[leungi@ohylpyt1-d]$ cd /bin/
[leungi@ohylpyt1-d /bin]$ find . -maxdepth 1 -type l -ls | grep "g++"
71073498    0 lrwxrwxrwx   1 root     root           14 Oct 16 07:51 ./g++ -> /usr/bin/g++49
71073708    0 lrwxrwxrwx   1 root     root           25 Oct 16 07:34 ./g++49 -> ../local/gcc493/bin/g++49
> dotR <- file.path(Sys.getenv("HOME"), ".R")
> if (!file.exists(dotR)) dir.create(dotR)
> M <- file.path(dotR, "Makevars")
> if (!file.exists(M)) file.create(M)
[1] TRUE
> cat("\nCXX = g++",
+ "CXX11STD = -std=c++11",
+     file = M, sep = "\n", append = TRUE)
> readLines(M)
[1] ""
[2] "CXX = g++"
[3] "CXX11STD = -std=c++11"
> install.packages("udpipe")
Installing package into â/usr/lib64/R/libraryâ
trying URL 'https://cloud.r-project.org/src/contrib/udpipe_0.8.3.tar.gz'
Content type 'application/x-gzip' length 4750855 bytes (4.5 MB)
==================================================
downloaded 4.5 MB

* installing *source* package âudpipeâ ...
** package âudpipeâ successfully unpacked and MD5 sums checked
** using staged installation
** libs
g++ -m64 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c RcppExports.cpp -o RcppExports.o
g++ -m64 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_phrases.cpp -o rcpp_phrases.o
g++ -m64 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_udpipe.cpp -o rcpp_udpipe.o
g++ -m64 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c udpipe.cpp -o udpipe.o
udpipe.cpp: In static member function âstatic bool ufal::udpipe::morphodita::gru_tokenizer_trainer::train(unsigned int, unsigned int, bool, unsigned int, unsigned int, unsigned int, float, float, float, float, bool, const std::vector<ufal::udpipe::morphodita::tokenized_sentence>&, const std::vector<ufal::udpipe::morphodita::tokenized_sentence>&, std::ostream&, std::string&)â:
udpipe.cpp:13510:80: error: âto_stringâ was not declared in this scope
     return error.assign("Gru tokenizer dimension '").append(to_string(dimension)).append("' is not supported!"), false;
                                                                                ^
udpipe.cpp: In static member function âstatic void ufal::udpipe::parsito::parser_nn_trainer::train(const string&, const string&, bool, const string&, const string&, const ufal::udpipe::parsito::network_parameters&, unsigned int, const std::vector<ufal::udpipe::parsito::tree>&, const std::vector<ufal::udpipe::parsito::tree>&, ufal::udpipe::utils::binary_encoder&)â:
udpipe.cpp:16453:73: error: âto_stringâ was not declared in this scope
         embeddings_from_file_comment = "[dim" + to_string(file_dimension) + "->" + to_string(dimension) + "]";
                                                                         ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::parsito::tree_input_format_conllu::next_tree(ufal::udpipe::parsito::tree&)â:
udpipe.cpp:17913:66: error: âto_stringâ was not declared in this scope
         return error.assign("Node ID '").append(to_string(node.id)).append("' form '").append(node.form).append("' has too large head: '").append(to_string(node.head)).append("'!"), false;
                                                                  ^
udpipe.cpp: In member function âvirtual void ufal::udpipe::parsito::tree_output_format_conllu::write_tree(const ufal::udpipe::parsito::tree&, std::string&, const ufal::udpipe::parsito::tree_input_format*) constâ:
udpipe.cpp:17947:30: error: âto_stringâ was not declared in this scope
     output.append(to_string(i)).push_back('\t');
                              ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_conllu::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18218:66: error: âto_stringâ was not declared in this scope
         return error.assign("Node ID '").append(to_string(word.id)).append("' form '").append(word.form).append("' has too large head: '").append(to_string(word.head)).append("'!"), false;
                                                                  ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_horizontal::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18314:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_vertical::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18404:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_presegmented_tokenizer::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18518:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvoid ufal::udpipe::token::set_token_range(size_t, size_t)â:
udpipe.cpp:19369:58: error: âto_stringâ was not declared in this scope
     start_misc_field("TokenRange").append(to_string(start)).append(1, ':').append(to_string(end));
                                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::morphodita_tokenizer_wrapper::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:19793:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In static member function âstatic bool ufal::udpipe::trainer_morphodita_parsito::train_tagger(const std::vector<ufal::udpipe::sentence>&, const std::vector<ufal::udpipe::sentence>&, const string&, std::ostream&, std::string&)â:
udpipe.cpp:21133:61: error: âto_stringâ was not declared in this scope
         model_name = "from_model_" + to_string(++model_index);
                                                             ^
udpipe.cpp: In static member function âstatic bool ufal::udpipe::trainer_morphodita_parsito::train_parser(const std::vector<ufal::udpipe::sentence>&, const std::vector<ufal::udpipe::sentence>&, const string&, const string&, std::ostream&, std::string&)â:
udpipe.cpp:21194:100: error: âto_stringâ was not declared in this scope
       if (embedding_upostag) embeddings.append("universal_tag ").append(to_string(embedding_upostag)).append(" 1\n");
                                                                                                    ^
udpipe.cpp:21195:88: error: âto_stringâ was not declared in this scope
       if (embedding_feats) embeddings.append("feats ").append(to_string(embedding_feats)).append(" 1\n");
                                                                                        ^
udpipe.cpp:21196:90: error: âto_stringâ was not declared in this scope
       if (embedding_xpostag) embeddings.append("tag ").append(to_string(embedding_xpostag)).append(" 1\n");
                                                                                          ^
udpipe.cpp:21198:67: error: âto_stringâ was not declared in this scope
         embeddings.append("form ").append(to_string(embedding_form)).append(" ").append(to_string(embedding_form_mincount));
                                                                   ^
udpipe.cpp:21203:69: error: âto_stringâ was not declared in this scope
         embeddings.append("lemma ").append(to_string(embedding_lemma)).append(" ").append(to_string(embedding_lemma_mincount));
                                                                     ^
udpipe.cpp:21207:91: error: âto_stringâ was not declared in this scope
       if (embedding_deprel) embeddings.append("deprel ").append(to_string(embedding_deprel)).append(" 1\n");
                                                                                           ^
make: *** [udpipe.o] Error 1
ERROR: compilation failed for package âudpipeâ
* removing â/usr/lib64/R/library/udpipeâ
* restoring previous â/usr/lib64/R/library/udpipeâ

The downloaded source packages are in
        â/tmp/Rtmpnx42Mk/downloaded_packagesâ
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning message:
In install.packages("udpipe") :
  installation of package âudpipeâ had non-zero exit status
jwijffels commented 4 years ago

Did you set the exact location of CXX11 in your makevars file (e.g. your local/gcc493/bin/g++49 path)

leungi commented 4 years ago

@jwijffels , thanks for your continued support.

I tried your suggestion to explicitly set g++49 path, but still no go. I also tried with c++49, which is available in /usr/local/gcc493/bin, but same outcome.

[leungi@ohylpyt1-d bin]$ pwd
/usr/local/gcc493/bin
[leungi@ohylpyt1-d bin]$ ls -ls
total 3280
800 -rwxr-xr-x. 1 root root 816792 Nov 20  2017 c++49
800 -rwxr-xr-x. 1 root root 815128 Nov 20  2017 cpp49
800 -rwxr-xr-x. 1 root root 816792 Nov 20  2017 g++49
796 -rwxr-xr-x. 1 root root 814104 Nov 20  2017 gcc49
 28 -rwxr-xr-x. 1 root root  25184 Nov 20  2017 gcc-ar49
 28 -rwxr-xr-x. 1 root root  25120 Nov 20  2017 gcc-nm49
 28 -rwxr-xr-x. 1 root root  25120 Nov 20  2017 gcc-ranlib49
[leungi@ohylpyt1-d bin]$ R
> dotR <- file.path(Sys.getenv("HOME"), ".R")
> if (!file.exists(dotR)) dir.create(dotR)
> M <- file.path(dotR, "Makevars")
> if (!file.exists(M)) file.create(M)
[1] TRUE
> cat("\nCXX11 = /usr/local/gcc493/bin/g++49", "CXX11STD = -std=c++11", file = M, sep = "\n", append = TRUE)
> readLines(M)
[1] ""                                    "CXX11 = /usr/local/gcc493/bin/g++49"
[3] "CXX11STD = -std=c++11"
> install.packages("udpipe")
Installing package into â/usr/lib64/R/libraryâ
(as âlibâ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/udpipe_0.8.3.tar.gz'
Content type 'application/x-gzip' length 4750855 bytes (4.5 MB)
==================================================
downloaded 4.5 MB

* installing *source* package âudpipeâ ...
** package âudpipeâ successfully unpacked and MD5 sums checked
** using staged installation
** libs
/usr/local/gcc493/bin/g++49 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c RcppExports.cpp -o RcppExports.o
/usr/local/gcc493/bin/g++49 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_phrases.cpp -o rcpp_phrases.o
/usr/local/gcc493/bin/g++49 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_udpipe.cpp -o rcpp_udpipe.o
/usr/local/gcc493/bin/g++49 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c udpipe.cpp -o udpipe.o
udpipe.cpp: In static member function âstatic bool ufal::udpipe::morphodita::gru_tokenizer_trainer::train(unsigned int, unsigned int, bool, unsigned int, unsigned int, unsigned int, float, float, float, float, bool, const std::vector<ufal::udpipe::morphodita::tokenized_sentence>&, const std::vector<ufal::udpipe::morphodita::tokenized_sentence>&, std::ostream&, std::string&)â:
udpipe.cpp:13510:80: error: âto_stringâ was not declared in this scope
     return error.assign("Gru tokenizer dimension '").append(to_string(dimension)).append("' is not supported!"), false;
                                                                                ^
udpipe.cpp: In static member function âstatic void ufal::udpipe::parsito::parser_nn_trainer::train(const string&, const string&, bool, const string&, const string&, const ufal::udpipe::parsito::network_parameters&, unsigned int, const std::vector<ufal::udpipe::parsito::tree>&, const std::vector<ufal::udpipe::parsito::tree>&, ufal::udpipe::utils::binary_encoder&)â:
udpipe.cpp:16453:73: error: âto_stringâ was not declared in this scope
         embeddings_from_file_comment = "[dim" + to_string(file_dimension) + "->" + to_string(dimension) + "]";
                                                                         ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::parsito::tree_input_format_conllu::next_tree(ufal::udpipe::parsito::tree&)â:
udpipe.cpp:17913:66: error: âto_stringâ was not declared in this scope
         return error.assign("Node ID '").append(to_string(node.id)).append("' form '").append(node.form).append("' has too large head: '").append(to_string(node.head)).append("'!"), false;
                                                                  ^
udpipe.cpp: In member function âvirtual void ufal::udpipe::parsito::tree_output_format_conllu::write_tree(const ufal::udpipe::parsito::tree&, std::string&, const ufal::udpipe::parsito::tree_input_format*) constâ:
udpipe.cpp:17947:30: error: âto_stringâ was not declared in this scope
     output.append(to_string(i)).push_back('\t');
                              ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_conllu::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18218:66: error: âto_stringâ was not declared in this scope
         return error.assign("Node ID '").append(to_string(word.id)).append("' form '").append(word.form).append("' has too large head: '").append(to_string(word.head)).append("'!"), false;
                                                                  ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_horizontal::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18314:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_vertical::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18404:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_presegmented_tokenizer::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18518:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvoid ufal::udpipe::token::set_token_range(size_t, size_t)â:
udpipe.cpp:19369:58: error: âto_stringâ was not declared in this scope
     start_misc_field("TokenRange").append(to_string(start)).append(1, ':').append(to_string(end));
                                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::morphodita_tokenizer_wrapper::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:19793:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In static member function âstatic bool ufal::udpipe::trainer_morphodita_parsito::train_tagger(const std::vector<ufal::udpipe::sentence>&, const std::vector<ufal::udpipe::sentence>&, const string&, std::ostream&, std::string&)â:
udpipe.cpp:21133:61: error: âto_stringâ was not declared in this scope
         model_name = "from_model_" + to_string(++model_index);
                                                             ^
udpipe.cpp: In static member function âstatic bool ufal::udpipe::trainer_morphodita_parsito::train_parser(const std::vector<ufal::udpipe::sentence>&, const std::vector<ufal::udpipe::sentence>&, const string&, const string&, std::ostream&, std::string&)â:
udpipe.cpp:21194:100: error: âto_stringâ was not declared in this scope
       if (embedding_upostag) embeddings.append("universal_tag ").append(to_string(embedding_upostag)).append(" 1\n");
                                                                                                    ^
udpipe.cpp:21195:88: error: âto_stringâ was not declared in this scope
       if (embedding_feats) embeddings.append("feats ").append(to_string(embedding_feats)).append(" 1\n");
                                                                                        ^
udpipe.cpp:21196:90: error: âto_stringâ was not declared in this scope
       if (embedding_xpostag) embeddings.append("tag ").append(to_string(embedding_xpostag)).append(" 1\n");
                                                                                          ^
udpipe.cpp:21198:67: error: âto_stringâ was not declared in this scope
         embeddings.append("form ").append(to_string(embedding_form)).append(" ").append(to_string(embedding_form_mincount));
                                                                   ^
udpipe.cpp:21203:69: error: âto_stringâ was not declared in this scope
         embeddings.append("lemma ").append(to_string(embedding_lemma)).append(" ").append(to_string(embedding_lemma_mincount));
                                                                     ^
udpipe.cpp:21207:91: error: âto_stringâ was not declared in this scope
       if (embedding_deprel) embeddings.append("deprel ").append(to_string(embedding_deprel)).append(" 1\n");
                                                                                           ^
make: *** [udpipe.o] Error 1
ERROR: compilation failed for package âudpipeâ
* removing â/usr/lib64/R/library/udpipeâ
* restoring previous â/usr/lib64/R/library/udpipeâ

The downloaded source packages are in
        â/tmp/RtmpM14MzT/downloaded_packagesâ
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning message:
In install.packages("udpipe") :
  installation of package âudpipeâ had non-zero exit status
jwijffels commented 4 years ago

Have you also tried with CXX11STD = -std=c++0x

leungi commented 4 years ago

Thanks for prompt reply; still no go.

Once again, I tried various options of CXX11 - g++, /usr/local/gcc493/bin/c++49 - with same outcome.

> dotR <- file.path(Sys.getenv("HOME"), ".R")
> if (!file.exists(dotR)) dir.create(dotR)
> M <- file.path(dotR, "Makevars")
> file.remove(M)
[1] TRUE
> if (!file.exists(M)) file.create(M)
[1] TRUE
> cat("\nCXX11 = /usr/local/gcc493/bin/g++49", "CXX11STD = -std=c++0x", file = M, sep = "\n", append = TRUE)
> readLines(M)
[1] ""                                    "CXX11 = /usr/local/gcc493/bin/g++49"
[3] "CXX11STD = -std=c++0x"
> install.packages("udpipe")
Installing package into â/usr/lib64/R/libraryâ
(as âlibâ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/udpipe_0.8.3.tar.gz'
Content type 'application/x-gzip' length 4750855 bytes (4.5 MB)
==================================================
downloaded 4.5 MB

* installing *source* package âudpipeâ ...
** package âudpipeâ successfully unpacked and MD5 sums checked
** using staged installation
** libs
/usr/local/gcc493/bin/g++49 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c RcppExports.cpp -o RcppExports.o
/usr/local/gcc493/bin/g++49 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_phrases.cpp -o rcpp_phrases.o
/usr/local/gcc493/bin/g++49 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_udpipe.cpp -o rcpp_udpipe.o
/usr/local/gcc493/bin/g++49 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c udpipe.cpp -o udpipe.o
udpipe.cpp: In static member function âstatic bool ufal::udpipe::morphodita::gru_tokenizer_trainer::train(unsigned int, unsigned int, bool, unsigned int, unsigned int, unsigned int, float, float, float, float, bool, const std::vector<ufal::udpipe::morphodita::tokenized_sentence>&, const std::vector<ufal::udpipe::morphodita::tokenized_sentence>&, std::ostream&, std::string&)â:
udpipe.cpp:13510:80: error: âto_stringâ was not declared in this scope
     return error.assign("Gru tokenizer dimension '").append(to_string(dimension)).append("' is not supported!"), false;
                                                                                ^
udpipe.cpp: In static member function âstatic void ufal::udpipe::parsito::parser_nn_trainer::train(const string&, const string&, bool, const string&, const string&, const ufal::udpipe::parsito::network_parameters&, unsigned int, const std::vector<ufal::udpipe::parsito::tree>&, const std::vector<ufal::udpipe::parsito::tree>&, ufal::udpipe::utils::binary_encoder&)â:
udpipe.cpp:16453:73: error: âto_stringâ was not declared in this scope
         embeddings_from_file_comment = "[dim" + to_string(file_dimension) + "->" + to_string(dimension) + "]";
                                                                         ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::parsito::tree_input_format_conllu::next_tree(ufal::udpipe::parsito::tree&)â:
udpipe.cpp:17913:66: error: âto_stringâ was not declared in this scope
         return error.assign("Node ID '").append(to_string(node.id)).append("' form '").append(node.form).append("' has too large head: '").append(to_string(node.head)).append("'!"), false;
                                                                  ^
udpipe.cpp: In member function âvirtual void ufal::udpipe::parsito::tree_output_format_conllu::write_tree(const ufal::udpipe::parsito::tree&, std::string&, const ufal::udpipe::parsito::tree_input_format*) constâ:
udpipe.cpp:17947:30: error: âto_stringâ was not declared in this scope
     output.append(to_string(i)).push_back('\t');
                              ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_conllu::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18218:66: error: âto_stringâ was not declared in this scope
         return error.assign("Node ID '").append(to_string(word.id)).append("' form '").append(word.form).append("' has too large head: '").append(to_string(word.head)).append("'!"), false;
                                                                  ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_horizontal::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18314:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_vertical::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18404:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::input_format_presegmented_tokenizer::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:18518:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In member function âvoid ufal::udpipe::token::set_token_range(size_t, size_t)â:
udpipe.cpp:19369:58: error: âto_stringâ was not declared in this scope
     start_misc_field("TokenRange").append(to_string(start)).append(1, ':').append(to_string(end));
                                                          ^
udpipe.cpp: In member function âvirtual bool ufal::udpipe::morphodita_tokenizer_wrapper::next_sentence(ufal::udpipe::sentence&, std::string&)â:
udpipe.cpp:19793:42: error: âto_stringâ was not declared in this scope
     s.set_sent_id(to_string(sentence_id++));
                                          ^
udpipe.cpp: In static member function âstatic bool ufal::udpipe::trainer_morphodita_parsito::train_tagger(const std::vector<ufal::udpipe::sentence>&, const std::vector<ufal::udpipe::sentence>&, const string&, std::ostream&, std::string&)â:
udpipe.cpp:21133:61: error: âto_stringâ was not declared in this scope
         model_name = "from_model_" + to_string(++model_index);
                                                             ^
udpipe.cpp: In static member function âstatic bool ufal::udpipe::trainer_morphodita_parsito::train_parser(const std::vector<ufal::udpipe::sentence>&, const std::vector<ufal::udpipe::sentence>&, const string&, const string&, std::ostream&, std::string&)â:
udpipe.cpp:21194:100: error: âto_stringâ was not declared in this scope
       if (embedding_upostag) embeddings.append("universal_tag ").append(to_string(embedding_upostag)).append(" 1\n");
                                                                                                    ^
udpipe.cpp:21195:88: error: âto_stringâ was not declared in this scope
       if (embedding_feats) embeddings.append("feats ").append(to_string(embedding_feats)).append(" 1\n");
                                                                                        ^
udpipe.cpp:21196:90: error: âto_stringâ was not declared in this scope
       if (embedding_xpostag) embeddings.append("tag ").append(to_string(embedding_xpostag)).append(" 1\n");
                                                                                          ^
udpipe.cpp:21198:67: error: âto_stringâ was not declared in this scope
         embeddings.append("form ").append(to_string(embedding_form)).append(" ").append(to_string(embedding_form_mincount));
                                                                   ^
udpipe.cpp:21203:69: error: âto_stringâ was not declared in this scope
         embeddings.append("lemma ").append(to_string(embedding_lemma)).append(" ").append(to_string(embedding_lemma_mincount));
                                                                     ^
udpipe.cpp:21207:91: error: âto_stringâ was not declared in this scope
       if (embedding_deprel) embeddings.append("deprel ").append(to_string(embedding_deprel)).append(" 1\n");
                                                                                           ^
make: *** [udpipe.o] Error 1
ERROR: compilation failed for package âudpipeâ
* removing â/usr/lib64/R/library/udpipeâ
* restoring previous â/usr/lib64/R/library/udpipeâ

The downloaded source packages are in
        â/tmp/RtmpjnFp4l/downloaded_packagesâ
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning message:
In install.packages("udpipe") :
  installation of package âudpipeâ had non-zero exit status
jwijffels commented 4 years ago

The best is to look on stackoverflow on the error you are getting namely to_string was not declared in this scope. It looks like your compiler does not support C++11. I don't have a redhat system to test myself.

leungi commented 4 years ago

Noted.

Just checked a few SO posts, and all points to your suggestion - CXX11STD = -std=c++0x.

Will keep researching, and update here.

Thanks again.

leungi commented 4 years ago

Update.

I got a new VM that comes with gcc and g++ >4.9.0, and installed udpipe fresh; however, still running into same issue.

[leungi@ohylpyt2-d ~]$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ./configure --disable-multilib --enable-languages=c,c++
Thread model: posix
gcc version 4.9.2 (GCC)

[leungi@ohylpyt2-d ~]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ./configure --disable-multilib --enable-languages=c,c++
Thread model: posix
gcc version 4.9.2 (GCC)
[leungi@ohylpyt2-d ~]$ R
R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(udpipe)
> packageVersion("udpipe")
[1] â0.8.3â
> data(brussels_reviews_anno, package = "udpipe")
> x <- subset(brussels_reviews_anno, language %in% "fr")
> keywords_phrases(x$xpos, pattern = c("DTNNVBRBJJ"), term = x$token, is_regex = T)
[1] keyword ngram   pattern start   end
<0 rows> (or 0-length row.names)
jwijffels commented 4 years ago

How the installation of udpipe was done depends on configuration settings of how you installed R. These setting are something you can retrieve with R CMD config --all in the shell. And next you need to see if the installation trace of installaing the udpipe package has used your C++11 compiler which contains a recent version of <regex>

leungi commented 4 years ago

Thanks for tip! I'll update with progress.

leungi commented 4 years ago

I'm hitting a wall here.

I realized CXX11STD = -std=gnu++11 based on R CMD config --all (below), hence I created a custom ~/.R/Makevars.

I read in a couple of post that even with custom Makevars, the package's /src/Makevars takes precedence; I suspect this may be the case here.

[leungi@ohylpyt2-d ~]$ R CMD config --all
CC = gcc -m64 -std=gnu99
CFLAGS = -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
CPICFLAGS = -fpic
CPP = gcc -m64 -std=gnu99 -E
CPPFLAGS = -I/usr/local/include
CXX = g++ -m64 -std=gnu++11
CXXCPP = g++ -m64 -std=gnu++11 -E
CXXFLAGS = -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
CXXPICFLAGS = -fpic
CXX98 = g++ -m64
CXX98STD = -std=gnu++98
CXX98FLAGS = -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
CXX98PICFLAGS = -fpic
CXX11 = g++ -m64
CXX11STD = -std=gnu++11
CXX11FLAGS = -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
CXX11PICFLAGS = -fpic
CXX14 = g++ -std=c++1y
CXX14STD =
CXX14FLAGS = -O3 -march=native -mtune=native -fPIC -Wno-unused-variable -Wno-unused-function
CXX14PICFLAGS =
CXX17 =
CXX17STD =
CXX17FLAGS =
CXX17PICFLAGSS =
DYLIB_EXT = .so
DYLIB_LD = gcc -m64 -std=gnu99
DYLIB_LDFLAGS = -shared -fopenmp
FC = gfortran -m64
F77 = gfortran -m64
FFLAGS = -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -I/usr/lib64/gfortran/modules
FPICFLAGS = -fpic
FLIBS = -lgfortran -lm -lquadmath
FCFLAGS = -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
FCPICFLAGS = -fpic
SAFE_FFLAGS = -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -I/usr/lib64/gfortran/modules -msse2 -mfpmath=sse
OBJC = gcc
OBJCFLAGS = -g -O2 -fobjc-exceptions
JAVA = /bin/java
JAVAC = /bin/javac
JAVAH = /bin/javah
JAR = /bin/jar
JAVA_HOME = /usr/lib/jvm/jre
JAVA_LIBS = -L/usr/lib/jvm/jre/lib/amd64/server -ljvm
JAVA_CPPFLAGS = -I/usr/lib/jvm/java/include -I/usr/lib/jvm/java/include/linux
LDFLAGS = -Wl,-z,relro
SHLIB_CFLAGS =
SHLIB_CXXFLAGS =
SHLIB_CXXLD = g++ -m64 -std=gnu++11
SHLIB_CXXLDFLAGS = -shared
SHLIB_CXX98LD = g++ -m64 -std=gnu++98
SHLIB_CXX98LDFLAGS = -shared
SHLIB_CXX11LD = g++ -m64 -std=gnu++11
SHLIB_CXX11LDFLAGS = -shared
SHLIB_CXX14LD = g++ -std=c++1y
SHLIB_CXX14LDFLAGS = -shared
SHLIB_CXX17LD =
SHLIB_CXX17LDFLAGS = -shared
SHLIB_EXT = .so
SHLIB_FFLAGS =
SHLIB_LD = gcc -m64 -std=gnu99
SHLIB_LDFLAGS = -shared
TCLTK_CPPFLAGS = -I/usr/include -I/usr/include
TCLTK_LIBS = -L/usr/lib64 -ltcl8.5 -L/usr/lib64 -ltk8.5 -lX11
BLAS_LIBS = -L/usr/lib64/R/lib -lRblas
LAPACK_LIBS = -L/usr/lib64/R/lib -lRlapack
MAKE = make
LIBnn = lib64

Installs with custom ~/.R/Makevars are successfull, but still facing same issue.

Various installs

> dotR <- file.path(Sys.getenv("HOME"), ".R")
> if (!file.exists(dotR)) dir.create(dotR)
> M <- file.path(dotR, "Makevars")
> if (file.exists(M)) file.remove(M)
[1] TRUE
> if (!file.exists(M)) file.create(M)
[1] TRUE
> cat("\nCXX = g++",
+     "CXX11STD = -std=c++0x",
+     file = M, sep = "\n", append = TRUE)
> readLines(M)
[1] ""                      "CXX = g++"             "CXX11STD = -std=c++0x"
> install.packages("/home/leungi/R_setup/udpipe_0.8.3.tar.gz", repos = NULL)
Installing package into â/usr/lib64/R/libraryâ
(as âlibâ is unspecified)
* installing *source* package âudpipeâ ...
** package âudpipeâ successfully unpacked and MD5 sums checked
** using staged installation
** libs
g++ -m64 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c RcppExports.cpp -o RcppExports.o
g++ -m64 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_phrases.cpp -o rcpp_phrases.o
g++ -m64 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_udpipe.cpp -o rcpp_udpipe.o
g++ -m64 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c udpipe.cpp -o udpipe.o
g++ -m64 -std=c++0x -shared -L/usr/lib64/R/lib -Wl,-z,relro -o udpipe.so RcppExports.o rcpp_phrases.o rcpp_udpipe.o udpipe.o -L/usr/lib64/R/lib -lR
installing to /usr/lib64/R/library/00LOCK-udpipe/00new/udpipe/libs
** R
** data
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
  converting help for package âudpipeâ
    finding HTML links ... done
    as.data.frame.udpipe_connlu             html
    as.matrix.cooccurrence                  html
    as_conllu                               html
    as_cooccurrence                         html
    as_phrasemachine                        html
    as_word2vec                             html
    brussels_listings                       html
    brussels_reviews                        html
    brussels_reviews_anno                   html
    cbind_dependencies                      html
    cbind_morphological                     html
    cooccurrence                            html
    document_term_frequencies               html
    document_term_frequencies_statistics    html
    document_term_matrix                    html
    dtm_bind                                html
    dtm_colsums                             html
    dtm_cor                                 html
    dtm_remove_lowfreq                      html
    dtm_remove_sparseterms                  html
    dtm_remove_terms                        html
    dtm_remove_tfidf                        html
    dtm_reverse                             html
    dtm_tfidf                               html
    keywords_collocation                    html
    keywords_phrases                        html
    keywords_rake                           html
    paste.data.frame                        html
    predict.LDA                             html
    strsplit.data.frame                     html
    txt_collapse                            html
    txt_contains                            html
    txt_freq                                html
    txt_highlight                           html
    txt_next                                html
    txt_nextgram                            html
    txt_previous                            html
    txt_previousgram                        html
    txt_recode                              html
    txt_recode_ngram                        html
    txt_sample                              html
    txt_sentiment                           html
    txt_show                                html
    txt_tagsequence                         html
    udpipe                                  html
    udpipe_accuracy                         html
    udpipe_annotate                         html
    udpipe_annotation_params                html
    udpipe_download_model                   html
    udpipe_load_model                       html
    udpipe_read_conllu                      html
    udpipe_train                            html
    unique_identifier                       html
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (udpipe)
Making 'packages.html' ... done
> library(udpipe)
> packageVersion("udpipe")
[1] â0.8.3â
> data(brussels_reviews_anno, package = "udpipe")
x <- subset(brussels_reviews_anno, language %in% "fr")
keywords_phrases(x$xpos, pattern = c("DTNNVBRBJJ"), term = x$token, is_regex = T)> x <- subset(brussels_reviews_anno, language %in% "fr")
> keywords_phrases(x$xpos, pattern = c("DTNNVBRBJJ"), term = x$token, is_regex = T)
[1] keyword ngram   pattern start   end
<0 rows> (or 0-length row.names)
> install.packages("udpipe")
Installing package into â/usr/lib64/R/libraryâ
(as âlibâ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/udpipe_0.8.3.tar.gz'
Content type 'application/x-gzip' length 4750855 bytes (4.5 MB)
==================================================
downloaded 4.5 MB

* installing *source* package âudpipeâ ...
** package âudpipeâ successfully unpacked and MD5 sums checked
** using staged installation
** libs
g++ -m64 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c RcppExports.cpp -o RcppExports.o
g++ -m64 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_phrases.cpp -o rcpp_phrases.o
g++ -m64 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_udpipe.cpp -o rcpp_udpipe.o
g++ -m64 -std=c++0x -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c udpipe.cpp -o udpipe.o
g++ -m64 -std=c++0x -shared -L/usr/lib64/R/lib -Wl,-z,relro -o udpipe.so RcppExports.o rcpp_phrases.o rcpp_udpipe.o udpipe.o -L/usr/lib64/R/lib -lR
installing to /usr/lib64/R/library/00LOCK-udpipe/00new/udpipe/libs
** R
** data
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
  converting help for package âudpipeâ
    finding HTML links ... done
    as.data.frame.udpipe_connlu             html
    as.matrix.cooccurrence                  html
    as_conllu                               html
    as_cooccurrence                         html
    as_phrasemachine                        html
    as_word2vec                             html
    brussels_listings                       html
    brussels_reviews                        html
    brussels_reviews_anno                   html
    cbind_dependencies                      html
    cbind_morphological                     html
    cooccurrence                            html
    document_term_frequencies               html
    document_term_frequencies_statistics    html
    document_term_matrix                    html
    dtm_bind                                html
    dtm_colsums                             html
    dtm_cor                                 html
    dtm_remove_lowfreq                      html
    dtm_remove_sparseterms                  html
    dtm_remove_terms                        html
    dtm_remove_tfidf                        html
    dtm_reverse                             html
    dtm_tfidf                               html
    keywords_collocation                    html
    keywords_phrases                        html
    keywords_rake                           html
    paste.data.frame                        html
    predict.LDA                             html
    strsplit.data.frame                     html
    txt_collapse                            html
    txt_contains                            html
    txt_freq                                html
    txt_highlight                           html
    txt_next                                html
    txt_nextgram                            html
    txt_previous                            html
    txt_previousgram                        html
    txt_recode                              html
    txt_recode_ngram                        html
    txt_sample                              html
    txt_sentiment                           html
    txt_show                                html
    txt_tagsequence                         html
    udpipe                                  html
    udpipe_accuracy                         html
    udpipe_annotate                         html
    udpipe_annotation_params                html
    udpipe_download_model                   html
    udpipe_load_model                       html
    udpipe_read_conllu                      html
    udpipe_train                            html
    unique_identifier                       html
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (udpipe)
Making 'packages.html' ... done

The downloaded source packages are in
        â/tmp/RtmpXEa4Fl/downloaded_packagesâ
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
> library(udpipe)
> packageVersion("udpipe")
[1] â0.8.3â
> data(brussels_reviews_anno, package = "udpipe")
x <- subset(brussels_reviews_anno, language %in% "fr")
keywords_phrases(x$xpos, pattern = c("DTNNVBRBJJ"), term = x$token, is_regex = T)> x <- subset(brussels_reviews_anno, language %in% "fr")
> keywords_phrases(x$xpos, pattern = c("DTNNVBRBJJ"), term = x$token, is_regex = T)
[1] keyword ngram   pattern start   end
<0 rows> (or 0-length row.names)
>
> dotR <- file.path(Sys.getenv("HOME"), ".R")
> if (!file.exists(dotR)) dir.create(dotR)
> M <- file.path(dotR, "Makevars")
> if (file.exists(M)) file.remove(M)
[1] TRUE
> if (!file.exists(M)) file.create(M)
[1] TRUE
> cat("\nCXX = g++",
+     "CXX11STD = -std=c++11",
+     file = M, sep = "\n", append = TRUE)
> readLines(M)
[1] ""                      "CXX = g++"             "CXX11STD = -std=c++11"
> install.packages("udpipe")
Installing package into â/usr/lib64/R/libraryâ
(as âlibâ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/udpipe_0.8.3.tar.gz'
Content type 'application/x-gzip' length 4750855 bytes (4.5 MB)
==================================================
downloaded 4.5 MB

* installing *source* package âudpipeâ ...
** package âudpipeâ successfully unpacked and MD5 sums checked
** using staged installation
** libs
g++ -m64 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c RcppExports.cpp -o RcppExports.o
g++ -m64 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_phrases.cpp -o rcpp_phrases.o
g++ -m64 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c rcpp_udpipe.cpp -o rcpp_udpipe.o
g++ -m64 -std=c++11 -I"/usr/include/R" -DNDEBUG  -I"/usr/lib64/R/library/Rcpp/include" -I/usr/local/include  -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic  -c udpipe.cpp -o udpipe.o
g++ -m64 -std=c++11 -shared -L/usr/lib64/R/lib -Wl,-z,relro -o udpipe.so RcppExports.o rcpp_phrases.o rcpp_udpipe.o udpipe.o -L/usr/lib64/R/lib -lR
installing to /usr/lib64/R/library/00LOCK-udpipe/00new/udpipe/libs
** R
** data
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
  converting help for package âudpipeâ
    finding HTML links ... done
    as.data.frame.udpipe_connlu             html
    as.matrix.cooccurrence                  html
    as_conllu                               html
    as_cooccurrence                         html
    as_phrasemachine                        html
    as_word2vec                             html
    brussels_listings                       html
    brussels_reviews                        html
    brussels_reviews_anno                   html
    cbind_dependencies                      html
    cbind_morphological                     html
    cooccurrence                            html
    document_term_frequencies               html
    document_term_frequencies_statistics    html
    document_term_matrix                    html
    dtm_bind                                html
    dtm_colsums                             html
    dtm_cor                                 html
    dtm_remove_lowfreq                      html
    dtm_remove_sparseterms                  html
    dtm_remove_terms                        html
    dtm_remove_tfidf                        html
    dtm_reverse                             html
    dtm_tfidf                               html
    keywords_collocation                    html
    keywords_phrases                        html
    keywords_rake                           html
    paste.data.frame                        html
    predict.LDA                             html
    strsplit.data.frame                     html
    txt_collapse                            html
    txt_contains                            html
    txt_freq                                html
    txt_highlight                           html
    txt_next                                html
    txt_nextgram                            html
    txt_previous                            html
    txt_previousgram                        html
    txt_recode                              html
    txt_recode_ngram                        html
    txt_sample                              html
    txt_sentiment                           html
    txt_show                                html
    txt_tagsequence                         html
    udpipe                                  html
    udpipe_accuracy                         html
    udpipe_annotate                         html
    udpipe_annotation_params                html
    udpipe_download_model                   html
    udpipe_load_model                       html
    udpipe_read_conllu                      html
    udpipe_train                            html
    unique_identifier                       html
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (udpipe)
Making 'packages.html' ... done

The downloaded source packages are in
        â/tmp/RtmpXEa4Fl/downloaded_packagesâ
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
leungi commented 4 years ago

Problem solved, thanks to this SO post.

Steps:

dotR <- file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR)) dir.create(dotR)
M <- file.path(dotR, "Makevars")
if (file.exists(M)) file.remove(M)
if (!file.exists(M)) file.create(M)
cat("CXX11=/usr/local/bin/g++",
    "CXX11STD=-std=c++11",
    file = M, sep = "\n", append = TRUE)
install.packages("udpipe")

Successful test

> library(udpipe)
> packageVersion("udpipe")
[1] â0.8.3â
> data(brussels_reviews_anno, package = "udpipe")
> x <- subset(brussels_reviews_anno, language %in% "fr")
> output <- keywords_phrases(x$xpos, pattern = c("DTNNVBRBJJ"), term = x$token, is_regex = T)
> head(output)
                              keyword ngram    pattern start  end
1 L' appartement est vraiment parfait     5 DTNNVBRBJJ   473  477
2          Le quartier est tres calme     5 DTNNVBRBJJ   536  540
3   le quartier est vraiment agreable     5 DTNNVBRBJJ  1679 1683
4 L' appartement est tres confortable     5 DTNNVBRBJJ  1809 1813
5       Le logement est tres agreable     5 DTNNVBRBJJ  2280 2284
6    L' appartement est tres agreable     5 DTNNVBRBJJ  3061 3065

Thanks again for your guidance @jwijffels!

jwijffels commented 4 years ago

Good that you found out how to do it! I will probably need that information also next week :)