HeidelTime / heideltime

A multilingual, cross-domain temporal tagger developed at the Database Systems Research Group at Heidelberg University.
GNU General Public License v3.0
343 stars 67 forks source link

"Standalone" is not standalone #9

Closed jzell closed 9 years ago

jzell commented 9 years ago
What steps will reproduce the problem?
1. Download Heideltime
2. Try to run example on the front page

What is the expected output? 

A date.

What do you see instead?

stuff about perl

ryan@3G08:~/Downloads/heideltime-standalone-1.3$ pwd
/home/ryan/Downloads/heideltime-standalone-1.3
ryan@3G08:~/Downloads/heideltime-standalone-1.3$ cat cat.txt 
Jannik Strötgen, Julian Zell, and Michael Gertz: HeidelTime: Tuning English and Developing
Spanish Resources for TempEval-3. In SemEval13, 15-19, 2013
ryan@3G08:~/Downloads/heideltime-standalone-1.3$ java -jar de.unihd.dbs.heideltime.standalone.jar
-t news cat.txt 
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] File missing to use TreeTagger
tokenizer: english-abbreviations
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] File missing to use TreeTagger
tokenizer: english.par
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] File missing to use TreeTagger
tokenizer: utf8-tokenize.perl
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] 
Cannot find tree tagger (SET ME IN CONFIG.PROPS!/cmd/utf8-tokenize.perl). Make sure
that path to tree tagger is set correctly in config.props!
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] 
If path is set correctly:

[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] Maybe you need to download
the TreeTagger tagger-scripts.tar.gz
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] from ftp://ftp.ims.uni-stuttgart.de/pub/corpora/tagger-scripts.tar.gz
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] Extract this file and copy
the missing file into the corresponding TreeTagger directories.
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] If missing, copy english-abbreviations
into SET ME IN CONFIG.PROPS!/lib
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] If missing, copy english.par
into SET ME IN CONFIG.PROPS!/lib
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] If missing, copy utf8-tokenize.perl
into SET ME IN CONFIG.PROPS!/cmd
ryan@3G08:~/Downloads/heideltime-standalone-1.3$ 

What version of the product are you using? On what operating system?

standalone.jar

Please provide any additional information below.

Original issue reported on code.google.com by compton.ryan on 2013-06-21 02:02:31

jzell commented 9 years ago
Dear Ryan,

Thank you for opening this issue. HeidelTime is available as
- a UIMA component, and as
- a standalone version, which can be used without UIMA.

In both cases, HeidelTime requires some preprocessing, namely, sentence splitting,
tokenization, and part-of-speech tagging. For all languages except Arabic and Vietnamese,
we use the TreeTagger for these tasks.

As explained in the readme of the UIMA version and in the Manual of the standalone
version, you have to download the TreeTagger and its modules for the languages you
want to process. In the standalone version, you then have to set the path to the TreeTagger
in the config.props. 

Please have a look in the Manual for more details.

Thanks,
Jannik

Original issue reported on code.google.com by jannik.stroetgen on 2013-06-21 07:51:00

jzell commented 9 years ago
Ok, thanks. My goal is to use heideltime from Hadoop. If it's all in Java, this will
be easy.

Original issue reported on code.google.com by compton.ryan on 2013-06-21 16:32:50

jzell commented 9 years ago
Ok, I'm still real confused here. I can't use perl for what I am doing. Is Heideltime
Java or not?

Original issue reported on code.google.com by compton.ryan on 2013-06-24 22:41:38

jzell commented 9 years ago
Dear Ryan,

I would kindly refer to the Manual where you can find a description how to run the
standalone version from the command line. Make sure that you have the TreeTagger installed
and the path to the Treetagger set correctly in the config.props, but as already mentioned,
everything is explained in the Manual.

If you run into specific problems, we are happy to help, but then we need to know what
you are actually trying to do in more detail.

Thanks,
Jannik

Original issue reported on code.google.com by jannik.stroetgen on 2013-06-26 12:33:29

jzell commented 9 years ago
My goal is to deploy Heideltime on a Hadoop cluster. Currently, I search for dates with
regex. I'd like to improve on that.

The Treetagger dependency is where I am stuck. It runs fine on my laptop, but, because
it's not Java, it's difficult (impossible?) to install/run Treetagger on every node
in my cluster.

Can I somehow remove Treetagger and still get ok results? Perhaps there is a Java library
out there that I can use in Treetaggerwrapper.java instead?

Original issue reported on code.google.com by compton.ryan on 2013-06-26 19:05:38

jzell commented 9 years ago
Hi Ryan,

You can use the Stanford POS Tagger instead of the TreeTagger -- however currently
not with the standalone version. We will add parameter to the standalone version to
decide which POS tagger should be used. However, this is not implemented yet. 

You could replace the TreeTaggerWrapper with the Stanford POS Wrapper in the source
code of the standalone version. What you should keep in mind is that HeidelTime requires
Sentence information. Without sentence information, you won't get any results. Without
token and pos information, you can get results, however, they will probably be worse.

Thanks,
Jannik

Original issue reported on code.google.com by jannik.stroetgen on 2013-06-28 09:30:43