gerardobort / node-corenlp

CoreNLP @ NodeJS
https://gerardobort.github.io/node-corenlp/docs/
GNU General Public License v3.0
65 stars 12 forks source link

English example, Spanish setup ? #45

Open rdroe opened 6 years ago

rdroe commented 6 years ago

This note may be obvious to more experienced users, but I had a small difficulty because of what may be an inconsistency between the docs and the shipped example configuration.

The first example in the github README.MD shows how to specify "English" for the parser language. But in the following setup instructions, the scripts actually download the spanish version of the parser, and the English models don't seem to be present. Because of this, I got an error about missing English files.

In the super-simple installation instructions in the README, it says to run "npm explore corenlp -- npm run corenlp:download" and "npm explore corenlp -- npm run corenlp:server". As of the time of writing, the bash scripts referenced by those commands seem to install and start the Spanish version of the Stanford parser.

The upshot, I think, is one of two corrections is needed; either (1) change "English" in the example to "Spanish" and also change the English example sentence to a Spanish one, or instead (2) modify the download bash script and the server-starting bash script to fetch and run the English instead of the Spanish parser models.

I tried a version of (2) and seem to have gotten it working. In the corenlp:download script, you would change the downloaded filenames to have "-english-" where they now have "-spanish-". In the corenlp:server script, I simply remarked-out the "-serverProperties ..." line, where Spanish is specified. (I realize that now I am not specifying any serverProperties, but I will look further into that and include some as I work more with my project.)

Thanks, gerardobort, for the parser. It is a real godsend. Others may see right through this; it may be a challenge for me just because I am not yet very familiar with the software. It's up to you.

PeterAJansen commented 4 years ago

Starting from the examples, it took me a while to figure out why the POS tags were not being populated -- it turned out they were being populated with a single Spanish tag. Here are the scripts I used to fix this (option (2) above from rdroe):

1) run npm install

2) replace node_modules/corenlp/scripts/corenlp-download with:

#!/bin/bash # Download CoreNLP mkdirdirname $0/../corenlp; \ pushd dirname $0/../corenlp; \ curl -O https://nlp.stanford.edu/software/stanford-english-corenlp-2017-06-09-models.jar; \ curl -O https://nlp.stanford.edu/software/stanford-corenlp-full-2017-06-09.zip; \ unzip stanford-corenlp-full-2017-06-09.zip && rm stanford-corenlp-full-2017-06-09.zip; \ mv stanford-english-corenlp-2017-06-09-models.jar stanford-corenlp-full-2017-06-09; \ popd

3) replace node_modules/corenlp/scripts/corenlp-server with:

#!/bin/bash # Start CoreNLP Server (spanish setup) if [ -fdirname $0/../corenlp/stanford-corenlp-full-2017-06-09/build.xml ]; then java -Xmx4g \ -cp "dirname $0/../corenlp/stanford-corenlp-full-2017-06-09/*" \ edu.stanford.nlp.pipeline.StanfordCoreNLPServer \ -port 9000 \ -timeout 30000 else echo "CoreNLP not found in corenlp/. Did you run `npm run corenlp:download`?" fi # -serverProperties StanfordCoreNLP-english.properties \