Closed sanjmeh closed 6 years ago
Thanks for reporting. Tested this out on Windows and this did gave 72 results for
np <- keywords_phrases(x$xpos, pattern = c("DTNNVBRBJJ"), term = x$token,is_regex=T)
On version 0.4 as well as version 0.5. Also there has not been a change in the function when comparing version 0.4 to version 0.5 of this R package.
So this seems to be Linux specific. The regular expression uses <regex>
from C++11, this was only released in version gcc 4.9.0. Which version of gcc do you have on your machine (what does gcc --version
indicate)
Update. Checked this on Ubuntu 14.04 with gcc 4.8.4 and indeed np <- keywords_phrases(x$xpos, pattern = c("DTNNVBRBJJ"), term = x$token,is_regex=T)
did not return anything. While on Ubuntu 16.04 with gcc 5.4.0, everything works fine, np <- keywords_phrases(x$xpos, pattern = c("DTNNVBRBJJ"), term = x$token,is_regex=T)
returns 72 rows.
Solution, make sure you have gcc 4.9.0 (see also: https://stackoverflow.com/questions/12530406/is-gcc-4-8-or-earlier-buggy-about-regular-expressions)
Upgraded to gcc 4.9 (earlier it was 4.8.5).
Current version:
gcc --version
gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4
But the problem persists.
> library(udpipe)
....
> np <- keywords_phrases(x$xpos, pattern = c("DT"), term = x$token,is_regex = T)
> np
# [1] keyword ngram pattern start end
# <0 rows> (or 0-length row.names)
About how I upgraded my gcc, here are a few steps, and you may be able to point out if there was a mistake somewhere. I do appreciate it is beyond the scope of udpipe package but this may save a lot of other udpipe users with Ubuntu 14.04 or gcc 4.8 or lower in getting frusutrated.
I have an ubuntu 14.04 machine.
I followed these instructions to update and it happened succesfully.
As a last step, I found it necessary to change the symbolic link /usr/bin/g++
from a target of /usr/bin/g++-4.8
to a target of /usr/bin/g++-4.9
I also checked the gcc version, it shows 4.9 but the regex still returns false.
Have you re-installed the udpipe package after you upgraded gcc? Please do.
Yes indeed, I had not reinstalled udpipe. Finally, it works. Thank you so much. Now I can hope to load my NLP packages online on Ubuntu and share with people for annotating manually and displaying processed text using shinydashboards or flexdashboards. I am closing this issue now. Thanks a lot.
Feel free to share shinydashboards & flexdashboards. That would be interesting!
@jwijffels : could you pls share your email id? Don't know how to communicate with you when there's no issue I have to report.
You can find my email here: https://github.com/bnosac/udpipe/blob/master/DESCRIPTION
I am running side by side the same code, same data on two machines.
One is on udpipe 0.4 and the other on udpipe 0.5 version.
The
keywords_phrases()
function is broken on 0.5 if we useis_regex=T
Consider the sample example in your help document.
The above should work in both 0.4 & 0.5.
Now consider the same example but with the function executed with
is_regex=T
I tried with many regex, even as simple as just
pattern = "DTJJ"
but none works. It seems the regex option does not work.I have also tested that regex works on the machine (an ubuntu server) by checking out the
grep
family of commands in R. So regex does not work in the udipe function only,