HeidelTime / heideltime

A multilingual, cross-domain temporal tagger developed at the Database Systems Research Group at Heidelberg University.
GNU General Public License v3.0
343 stars 67 forks source link

HeidelTime can now also be used for English temponym tagging. For details, see our TempWeb'16 paper.

HeidelTime contains automatically created resources for 200+ languages in addition to manually created ones for 13 languages. For further details, take a look at our EMNLP 2015 paper.

About HeidelTime

HeidelTime is a multilingual, domain-sensitive temporal tagger developed at the Database Systems Research Group at Heidelberg University. It extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard. HeidelTime is available as UIMA annotator and as standalone version.

HeidelTime currently contains hand-crafted resources for 13 languages: English, German, Dutch, Vietnamese, Arabic, Spanish, Italian, French, Chinese, Russian, Croatian, Estonian and Portuguese. In addition, starting with version 2.0, HeidelTime contains automatically created resources for more than 200 languages. Although these resources are of lower quality than the manually created ones, temporal tagging of many of these languages has never been addressed before. Thus, HeidelTime can be used as a baseline for temporal tagging of all these languages or as a starting point for developing temporal tagging capabilities for them.

HeidelTime distinguishes between news-style documents and narrative-style documents (e.g., Wikipedia articles) in all languages. In addition, English colloquial (e.g., Tweets and SMS) and scientific articles (e.g., clinical trails) are supported.

Want to see what it can do before you delve in? Take a look at our online demo.

HeidelTime demo picture

Latest downloads

Maven

A minimal set of dependencies is satisfied by these entries for your pom.xml:

        <dependency>
            <groupId>org.apache.uima</groupId>
            <artifactId>uimaj-core</artifactId>
            <version>2.8.1</version>
        </dependency>
        <dependency>
            <groupId>com.github.heideltime</groupId>
            <artifactId>heideltime</artifactId>
            <version>2.2</version>
        </dependency>

For some additional features, you will need to provide additional dependencies. See our Maven wiki page.

Publications

If you use HeidelTime, please cite the appropriate paper (in general, this would be the journal paper [4]; if you use HeidelTime with automatically created resources, please cite paper [10]; if you use HeidelTime for temponym tagging, please cite paper [11]):

  1. Strötgen, Gertz: HeidelTime: High Qualitiy Rule-based Extraction and Normalization of Temporal Expressions. SemEval'10. pdf bibtex
  2. Strötgen, Gertz: Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. LREC'12. pdf bibtex
  3. Strötgen et al.: HeidelTime: Tuning English and Developing Spanish Resources for TempEval-3. SemEval'13. pdf bibtex
  4. Strötgen, Gertz: Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, 2013. pdf bibtex
  5. Strötgen et al.: Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese. TALIP, 2014. pdf bibtex
  6. Li et al.: Chinese Temporal Tagging with HeidelTime. EACL'14. pdf bibtex
  7. Strötgen et al.: Extending HeidelTime for Temporal Expressions Referring to Historic Dates. LREC'14. pdf bibtex
  8. Manfredi et al.: HeidelTime at EVENTI: Tuning Italian Resources and Addressing TimeML's Empty Tags. EVALITA'14. pdf bibtex
  9. Strötgen: Domain-sensitive Temporal Tagging for Event-centric Information Retrieval. PhD Thesis. pdf bibtex
  10. Strötgen, Gertz: A Baseline Temporal Tagger for All Languages. EMNLP'15. pdf bibtex
  11. Kuzey, Strötgen, Setty, Weikum: Temponym Tagging: Temporal Scopes for Textual Phrases. TempWeb'16. pdf bibtex

Language Resources

We want to thank the following researchers for their efforts to develop HeidelTime resources:

  1. Dutch resources: Matje van de Camp, Tilburg University
  2. French resources: Véronique Moriceau, LIMSI - CNRS
  3. Russian resources: Elena Klyachko
  4. Croatian resources: Luka Skukan, University of Zagreb
  5. Portuguese resources: Zunsik Lim

Please feel free to use our automatically created resources as starting point, if you plan to manually address a language.


Tell me more!

HeidelTime was developed in Java with extensibility in mind -- especially in terms of language-specific resources, as well as in terms of programmatic functionality.

Get your hands dirty!