OpenSextant / OpenSextantToolbox

A geotagger and entity extractor
Other
15 stars 7 forks source link

simplest way to generate opensextant JSON output #10

Closed johnrfrank closed 8 years ago

johnrfrank commented 9 years ago

I would like to generate a set of example JSON output from OpenSextant. What is the simplest way to do this? Is there a running instance somewhere that I could send a few sample documents?

Things I tried:

That fails when it turns out that my solr+jetty system does not have data loaded into it. The various build.xml files for ant in the opensextant git repo don't "just work."

then the examples.xml is more valid (looks like that directory capitalization naming error might have resulted from someone developing on MacOS's default case-insensitive file system)... but it fails like this:

$ ant -f examples.xml Buildfile: /ebs/third/opensextant/opensextant-toolbox-2.0/examples.xml

run.examples: [echo] [echo] ----------------------------------------------- [echo] Running the Geotagger Example [echo] ----------------------------------------------- [java] Initializing [java] Exception in thread "main" java.lang.IllegalArgumentException: Parameter 'directory' is not a directory [java] at org.apache.commons.io.FileUtils.listFiles(FileUtils.java:460) [java] at org.apache.commons.io.FileUtils.listFiles(FileUtils.java:546) [java] at org.opensextant.examples.GeotaggerExample.main(GeotaggerExample.java:75) [java] Java Result: 1 [echo] [echo] ------------------------------------------------------------------------------- [echo] Running the General Purpose Entity Extractor Example [echo] ------------------------------------------------------------------------------- [java] Initializing [java] Exception in thread "main" java.lang.IllegalArgumentException: Parameter 'directory' is not a directory [java] at org.apache.commons.io.FileUtils.listFiles(FileUtils.java:460) [java] at org.apache.commons.io.FileUtils.listFiles(FileUtils.java:546) [java] at org.opensextant.examples.GeneralPurposeTaggerExample.main(GeneralPurposeTaggerExample.java:73) [java] Java Result: 1 [echo] [echo] -------------------------------------------------- [echo] Running the Solr Matcher Example [echo] -------------------------------------------------- [java] No arg supplied for location of solr gazetteer. Using environment variable [java] Exception reading text from filematcherTest.txt

Thanks for any pointers.

jrf

johnrfrank commented 9 years ago

This command almost works... but how does one populate the SOLR system?

$ java -Xmx1500m -Dgate.home=. -Dlog4j.configration=file:./etc/log4jproperites -Dsolr.home=./Gazetteer/solr -cp "./lib/:./lib/GATE/:./lib/Logging/:./lib/Solr/" org.opensextant.examples.GeotaggerExample ./LanguageResources/GAPPs/OpenSextant_Geotagger.gapp /data/west-africa-clean_visible Initializing log4j:WARN No appenders could be found for logger (gate.Gate). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Done Initializing

Exception in thread "main" java.lang.IllegalStateException: This PR hasn't been init'ed! at org.opensextant.toolbox.NaiveTaggerSolrPR.execute(NaiveTaggerSolrPR.java:138) at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291) at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:154) at gate.creole.SerialController.executeImpl(SerialController.java:153) at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:129) at gate.creole.AbstractController.execute(AbstractController.java:75) at org.opensextant.examples.GeotaggerExample.main(GeotaggerExample.java:109)

dlutz2 commented 9 years ago

Will look for and resolve any case sensitive stuff in the build process. Till that gets fixed, the data loading step is defined in the ant task "build.gaz". This creates a skeleton solr directory structure and then loads that data into it . The load itself is done by the "load.gazetteer" ant task which runs the java class "org.opensextant.matching.DataLoader" which does the actual data stuffing.

dlutz2 commented 8 years ago

No further activity