agentile / PHP-Stanford-NLP

PHP interface to Stanford NLP tools (POS Tagger, NER, Parser)
168 stars 51 forks source link

Empty array returned #1

Closed jaspal747 closed 9 years ago

jaspal747 commented 10 years ago

Hi,

I am trying to use this POSTager as described in the steps. However, I am getting an empty array as output. I have tried using your php parser on both cmd line and via browser. I am getting an empty array as output on both.

I am using the latest version (3.3.1) of the stanford post tagger (stanford-postagger-2014-01-04) of 35 MB size. I have tested this via their GUI and it works fine.

Any help would be highy apreciated.

Thanks

agentile commented 10 years ago

Hello,

Thanks for letting me know, I'll take a look as soon as I can and get back to you, thanks!

jaspal747 commented 10 years ago

Thanks for the quick reply!

agentile commented 10 years ago

Downloaded http://nlp.stanford.edu/software/stanford-postagger-full-2014-01-04.zip (105 MB) the english only version is 24MB ... you mentioned 35MB, wondering why yours is different.

Then I ran the following script

<?php
require_once 'StanfordNLP/Base.php';
require_once 'StanfordNLP/StanfordTagger.php';
require_once 'StanfordNLP/POSTagger.php';
$pos = new \StanfordNLP\POSTagger(
  '/var/www/stanford-postagger-full-2014-01-04/models/english-left3words-distsim.tagger',
  '/var/www/stanford-postagger-full-2014-01-04/stanford-postagger.jar'
);
$result = $pos->tag(explode(' ', "What does the fox say?"));
var_dump($result);

received the following output

array(6) {
  [0]=>
  array(2) {
    [0]=>
    string(4) "What"
    [1]=>
    string(2) "WP"
  }
  [1]=>
  array(2) {
    [0]=>
    string(4) "does"
    [1]=>
    string(3) "VBZ"
  }
  [2]=>
  array(2) {
    [0]=>
    string(3) "the"
    [1]=>
    string(2) "DT"
  }
  [3]=>
  array(2) {
    [0]=>
    string(3) "fox"
    [1]=>
    string(2) "NN"
  }
  [4]=>
  array(2) {
    [0]=>
    string(3) "say"
    [1]=>
    string(2) "VB"
  }
  [5]=>
  array(2) {
    [0]=>
    string(1) "?"
    [1]=>
    string(1) "."
  }
}

Can you replicate these steps and see what happens? You have java on your system right?

jaspal747 commented 10 years ago

I am using "http://nlp.stanford.edu/downloads/stanford-postagger-2014-01-04.zip"

From this page: http://nlp.stanford.edu/downloads/tagger.shtml

It mentions Download basic English Stanford Tagger version 3.3.1 [35 MB] Download full Stanford Tagger version 3.3.1 [155 MB]

I will give your steps a try.

And yes I have Java on my system :)

agentile commented 10 years ago

Hello @jaspal747 , did you ever figure this out? Thanks!

samkool commented 10 years ago

Hi,

I have the same problem. But when I look at the $pos object it stores an error stating the following: "Error: Could not find or load main class edu.stanford.nlp.tagger.maxent.MaxentTagger" After trying a bizillion things I still cant get it to work. I am using wamp on a win8 machine. This is the code I use:

require_once '../libraries/StanfordNLP/Base.php';
require_once '../libraries/StanfordNLP/StanfordTagger.php';
require_once '../libraries/StanfordNLP/POSTagger.php';
$pos = new \StanfordNLP\POSTagger(
 '[HIDDEN]/libraries/StanfordNLP/stanford-postagger-2014-01-04/models/english-left3words-distsim.tagger',
  '[HIDDEN]/libraries/StanfordNLP/stanford-postagger-2014-01-04/stanford-postagger.jar'
);
$result = $pos->tag(explode(' ', "What does the fox say?"));
var_dump($result);
var_dump($pos);
agentile commented 10 years ago

@samkool http://nlp.stanford.edu/software/pos-tagger-faq.shtml Did you set the CLASSPATH as mentioned in that link?

samkool commented 10 years ago

Thx for the rapid reaction! This is he command that is used:

java -cp '[HIDDEN]/libraries/StanfordNLP/stanford-postagger-2014-01-04/stanford-postagger.jar;' edu.stanford.nlp.tagger.maxent.MaxentTagger -model [HIDDEN]/libraries/StanfordNLP/stanford-postagger-2014-01-04/models/english-left3words-distsim.tagger -textFile C:\Windows\Temp\phpCDB2.tmp -outputFormat slashTags -tagSeparator _ -encoding utf8

Playing around with it didn't get me much further.

EDIT: Apparently I forgot to try deleting the ; and ' surrounding the .jar This appears to be the fix. This was in the original code though, so maybe that is a bug..??

agentile commented 10 years ago

Looks like you are using windows. I've never tested with Windows setup. Likely the issue is that you need an absolute path to java. If you were to enter the command into windows terminal (start -> run -> cmd) my guess is it wouldn't work and it would be because java command isn't in global space. So, you need to find out where your java is located and do a $pos->setJavaPath('C:\path\to\javabinary') before you do the $pos->tag() command.

agentile commented 10 years ago

Additionally you are including everything incorrectly. In my examples I am using unix where the directory separator is / but on windows it is \ so you need to update your code so that paths use the proper directory separator.

samkool commented 10 years ago

The default java path works fine, so I would recommend to other people having this issue: Go into the StanfordTagger.php and change the variable decleration on line 122/123/124 to:

$cmd = escapeshellcmd($this->getJavaPath() . " $options -cp " . $this->getJar() 
                    . " edu.stanford.nlp.tagger.maxent.MaxentTagger -model ". $this->getModel() 
                    ." -textFile ".$tmpfname." -outputFormat slashTags -tagSeparator ".$separator." -encoding utf8");
samkool commented 10 years ago

Haha, I might have the / \ wrong but it works, so I am not gonna mess with it anymore :P

domarco commented 8 years ago

@samkool did change the variable declaration on line 122/123/124 fix the issue, because I still got an error, Are you mean 122/123/124 line code is inside $descriptorspec = array( ?

covaberjon commented 8 years ago

@domarco @samkool I've got the same issue. Did you solve that problem?

domarco commented 8 years ago

@diegorodriguezvidal try using version stanford-postagger-2015-04-20, it solve my problem. And don't forget to manually defined your postagger library path. Example like below: $pos = new \StanfordNLP\POSTagger( 'D:/Project/NLP/stanford-postagger-2015-04-20/models/english-left3words-distsim.tagger', 'D:/Project/NLP/stanford-postagger-2015-04-20/stanford-postagger.jar' ); String "D:/Project/NLP" was my folder path to "stanford-postagger-2015-04-20" folder. You can save the path to a "$variable" first, write it as Have fun, hope this help. *I'm using windows and the codes works fine.

covaberjon commented 8 years ago

@domarco Thanks for the help, I tried your approach and it doesn't work for me. In my case everything is working on localhost. I am just having this error code in production: Error: Could not find or load main class edu.stanford.nlp.parser.lexparser.LexicalizedParser and I don't know how to solve it...

samkool commented 8 years ago

Hey Diego, Ill look at whrn I get home monday!

samarthbhasin commented 5 years ago

@domarco Thanks for the help, I tried your approach and it doesn't work for me. In my case everything is working on localhost. I am just having this error code in production: Error: Could not find or load main class edu.stanford.nlp.parser.lexparser.LexicalizedParser and I don't know how to solve it...

Hi @diegorodriguezvidal , did you ever solve this? I'm having a similar issue. I'm trying to load the Stanford-ner.jar on Spark executors on AWS EMR cluster. My code just can't find or load the main class.

domarco commented 5 years ago

@samarthbhasin have you tried my approach to manually define your filepath? try using version: stanford-postagger-2015-04-20 <- this was the stable version. Your error mean that it cannot find the file, mainly because of a wrong path.

samarthbhasin commented 5 years ago

@domarco this is what the error looks like, which includes the java command being executed. from this it seems like it can definitely find the jar file, only problem is it is unable to load the main class... not sure why

OSError: Java command failed : ['/usr/lib/jvm/java-1.8.0/bin/java', '-mx4096m', '-cp', '/home/hadoop/stanford/stanford-ner-production/stanford-ner-3.9.1-sources.jar:/home/hadoop/stanford/stanford-ner-production/slf4j-log4j12-1.7.25.jar:/home/hadoop/stanford/stanford-ner-production/stanford-ner-3.9.1.jar:/home/hadoop/stanford/stanford-ner-production/stanford-ner-3.9.1-javadoc.jar:/home/hadoop/stanford/stanford-ner-production/stanford-ner.jar:/home/hadoop/stanford/stanford-ner-production/slf4j-api-1.7.25.jar:/home/hadoop/stanford/stanford-ner-production/lib/joda-time.jar:/home/hadoop/stanford/stanford-ner-production/lib/jollyday-0.4.9.jar:/home/hadoop/stanford/stanford-ner-production/lib/stanford-ner-resources.jar', 'edu.stanford.nlp.ie.crf.CRFClassifier', '-loadClassifier', '/home/hadoop/stanford/stanford-ner-production/classifiers/english.all.3class.distsim.crf.ser.gz', '-textFile', '/tmp/tmpdJuvgP', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf-8']