OpenTreeOfLife / reference-taxonomy

Open Tree Reference Taxonomy (OTT) tools
BSD 2-Clause "Simplified" License
11 stars 12 forks source link

Use gnparse instead of regular expressions to convert GBIF scientificName column to canonicalName #318

Open jar398 opened 7 years ago

jar398 commented 7 years ago

Right now we use some very ad hoc regular expressions. Using gnparse would be much nicer. Here is a code snippet proving that you can invoke gnparse from jython (thanks @dimus):

curl -L "https://github.com/GlobalNamesArchitecture/gnparser/releases/download/release-0.3.3/gnparser-assembly-0.3.3.jar" >gnparser-assembly-0.3.3.jar
curl -L "http://mumble.net/~jar/tmp/jython-standalone.jar" >jython-standalone.jar
cat >foo.py <<EOF
from org.globalnames.parser import ScientificNameParser
import json
z = ScientificNameParser.instance().fromString("Homo sapiens L.")
print z.canonized(False).get()
EOF
export JYTHONPATH=gnparser-assembly-0.3.3.jar
JYTHON=jython-standalone.jar
java -jar $JYTHON foo.py