bnosac / udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
https://bnosac.github.io/udpipe/en
Mozilla Public License 2.0
209 stars 33 forks source link

keywords_phrases broken. Alternatives to updating gcc? #43

Closed randomgambit closed 5 years ago

randomgambit commented 5 years ago

Hi @jwijffels , I have the same issue as https://github.com/bnosac/udpipe/issues/20, however I dont have the possibility to update gcc. Are there other possible solutions? Cant you use the stringr package in the udpipe code instead? Its really the first time ever that something breaks because of linux.. What do you think?

Thanks!

jwijffels commented 5 years ago

Indeed solution to issue #20 is just update your gcc compiler to at least 4.9 The main reason why the main workhorse for keywords_phrases was written in C++ and not with basic r regex is that it needs basically a large for loop looping over all the rows, which makes it better to execute in C++ instead of in R. The C++ code is here https://github.com/bnosac/udpipe/blob/master/src/rcpp_phrases.cpp and you can pretty easily change that to using simple regular expressions directly from R but I doubt that this will be very efficient.