agentile / PHP-Stanford-NLP

PHP interface to Stanford NLP tools (POS Tagger, NER, Parser)
168 stars 51 forks source link

Penn Only #23

Closed AdamNix closed 7 years ago

AdamNix commented 7 years ago

Hi

Is there a way to output only the Penn results?

    // Word and tags is first
    // penn is second
    // typed dependencies is last.

Wonderful program.

agentile commented 7 years ago

Hi Adam,

Are you meaning just

$result = $parser->parseSentence("What does the fox say?");
var_dump($result['penn']);

Or are you wanting for $parser->parseSentence("What does the fox say?"); to just do the penn work doing something like $parser->parseSentence("What does the fox say?", ['penn']); ?

FWIW , I do plan to revisit this project and address a lot of the issues people have been having and make it more up to date.

AdamNix commented 7 years ago

Thanks for the reply, Anthony. I believe at lot of the issues revolve around trying to sync three areas, Java updates/PHP/Stanford Parser updates. e.g When I run either of the suggestions above I get:

array(2) { ["parent"]=> NULL ["children"]=> array(0) { } }

I am using PHP 7.1, and Stanford Parser: 'C:\stanford-parser-full-2015-04-20\stanford-parser.jar', 'C:\stanford-parser-full-2015-04-20\stanford-parser-3.5.2-models.jar'

Using the original example, I flattened the array and would be able to get rid of the 'words and tags' using a search for '(ROOT' The Universal dependencies(typed dependencies) are a little more difficult.

agentile commented 7 years ago

@AdamNix Hmm, yeah, I am not sure why you are getting an array like that.

I just updated the repo, retesting against stanford version 3.8.0 ... here is what I am running Java and PHP wise

agentile@agentile:~/php-stanford$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

agentile@agentile:~/php-stanford$ php -v
PHP 7.0.18-0ubuntu0.16.04.1 (cli) ( NTS )
Copyright (c) 1997-2017 The PHP Group
Zend Engine v3.0.0, Copyright (c) 1998-2017 Zend Technologies
    with Zend OPcache v7.0.18-0ubuntu0.16.04.1, Copyright (c) 1999-2017, by Zend Technologies

Here is what I ran https://github.com/agentile/PHP-Stanford-NLP/blob/master/examples/stanford.php

Notice the commented out bits, if you setDebug, it will echo out the actual command sent behind the scenes, e.g.

agentile@agentile:~/php-stanford/examples$ php stanford.php 
DEBUG: Command used: java -mx300m -cp "/home/agentile/php-stanford/stanford-parser-full-2017-06-09/stanford-parser.jar:/home/agentile/php-stanford/stanford-parser-full-2017-06-09/stanford-parser-3.8.0-models.jar" edu.stanford.nlp.parser.lexparser.LexicalizedParser -encoding UTF-8 -outputFormat "wordsAndTags,penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz /tmp/phpnlpparserYRVcyk

If you set $parser->setOutputFormat('penn');

Then it will only do that format, and the values for wordsAngTags and typedDependencies will be null

You should be getting back an associative array.

So if you can possibly do a few things:

Tell me if you are running on unix or windows and if you can copy paste more of the code that you are running, as it is hard for me to debug clearly on your behalf without more information. Can you also update to stanford version 3.8.0?

agentile commented 7 years ago
agentile@agentile:~/php-stanford/examples$ java -mx300m -cp "/home/agentile/php-stanford/stanford-parser-full-2017-06-09/stanford-parser.jar:/home/agentile/php-stanford/stanford-parser-full-2017-06-09/stanford-parser-3.8.0-models.jar" edu.stanford.nlp.parser.lexparser.LexicalizedParser -encoding UTF-8 -outputFormat "wordsAndTags,penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz /tmp/test
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.7 sec].
Parsing file: /tmp/test
Parsing [sent. 1 len. 6]: What does the fox say ?
What/WP does/VBZ the/DT fox/NN say/VB ?/.

(ROOT
  (SBARQ
    (WHNP (WP What))
    (SQ (VBZ does)
      (NP (DT the) (NN fox))
      (VP (VB say)))
    (. ?)))

dobj(say-5, What-1)
aux(say-5, does-2)
det(fox-4, the-3)
nsubj(say-5, fox-4)
root(ROOT-0, say-5)

Parsed file: /tmp/test [1 sentences].
Parsed 6 words in 1 sentences (9.63 wds/sec; 1.61 sents/sec).
AdamNix commented 7 years ago

I'm using Windows 7 Pro I've updated to stanford version 3.8.0. I have the Stanford Parser and Postagger both zipped and unzipped in a file called Stanford. My PHP file is there too. When I run the below program for $results[penn] I get an empty array,

` /**

/**

/**

/**

/**

/**

// autoload mimicks https://github.com/auraphp spl_autoload_register(function ($class) { // the package namespace $ns = 'StanfordNLP'; // what prefixes should be recognized? $prefixes = array( "{$ns}\" => array( DIR . '/src/' . $ns, ), ); // go through the prefixes foreach ($prefixes as $prefix => $dirs) { // does the requested class match the namespace prefix? $prefix_len = strlen($prefix); if (substr($class, 0, $prefix_len) !== $prefix) { continue; } // strip the prefix off the class $class = substr($class, $prefix_len); // a partial filename $part = str_replace('\', DIRECTORY_SEPARATOR, $class) . '.php'; // go through the directories to find classes foreach ($dirs as $dir) { $dir = str_replace('/', DIRECTORY_SEPARATOR, $dir); $file = $dir . DIRECTORY_SEPARATOR . $part; if (is_readable($file)) { require $file; return; } } } });

//Adam This below code comes back with no result.

// assume composer autoload require_once dirname(dirname(FILE)) . DIRECTORY_SEPARATOR . 'vendor' . DIRECTORY_SEPARATOR . 'autoload.php'; $path = dirname(dirname(FILE)) . DIRECTORY_SEPARATOR . 'stanford-parser-full-2017-06-09'; $parser = new \StanfordNLP\Parser( $path . DIRECTORY_SEPARATOR . 'stanford-parser.jar', $path . DIRECTORY_SEPARATOR . 'stanford-parser-3.8.0-models.jar' ); //$parser->setDebug(true); //$parser->setOutputFormat('penn'); //$result = $parser->parseSentence("What does the fox say?"); $result = $parser->parseSentences(["What does the fox say?", "Hi bob, how are you?"]); var_dump($result);`

//Adam The below code gets me the words&tags, Penn, typed dependencies, but all in one array and it cannot be separated easily. $parser = new \StanfordNLP\PaC:\stanford\stanford-parser-full-2018-06-09\stanford-parser.C:\stanford\stanford-parser-full-2018-06-09\stanford-parser-3.5.2-models.jar' );

var_dump($result);

AdamNix commented 7 years ago

Adding $parser->setOutputFormat('penn'); seems to have done the trick. Here is the code:

$parser = new \StanfordNLP\Parser( 'C:\stanford\stanford-parser-full-2017-06-09\stanford-parser.jar', 'C:\stanford\stanford-parser-full-2017-06-09\stanford-parser-3.8.0-models.jar' );

//var_dump($result); $parser->setOutputFormat('penn'); $result = $parser->parseSentence("What does the fox say?"); var_dump($result['penn']);

Here is the output: StanSmitharray(2) { ["parent"]=> string(4) "ROOT" ["children"]=> array(1) { [0]=> array(2) { ["parent"]=> string(5) "SBARQ" ["children"]=> array(3) { [0]=> array(2) { ["parent"]=> string(4) "WHNP" ["children"]=> array(1) { [0]=> array(2) { ["parent"]=> string(7) "WP What" ["children"]=> array(0) { } } } } [1]=> array(2) { ["parent"]=> string(2) "SQ" ["children"]=> array(3) { [0]=> array(2) { ["parent"]=> string(8) "VBZ does" ["children"]=> array(0) { } } [1]=> array(2) { ["parent"]=> string(2) "NP" ["children"]=> array(2) { [0]=> array(2) { ["parent"]=> string(6) "DT the" ["children"]=> array(0) { } } [1]=> array(2) { ["parent"]=> string(6) "NN fox" ["children"]=> array(0) { } } } } [2]=> array(2) { ["parent"]=> string(2) "VP" ["children"]=> array(1) { [0]=> array(2) { ["parent"]=> string(6) "VB say" ["children"]=> array(0) { } } } } } } [2]=> array(2) { ["parent"]=> string(3) ". ?" ["children"]=> array(0) { } } } } } } [Finished in 3.5s]

Many Thanks, Anthony,