An adaptation of the MarMot higher-order CRF tagger for generic sequence-to-sequence tasks from our paper.
Please use the following citation:
@inproceedings{Schnober:2016:Coling,
author = {Carsten Schnober and Steffen Eger and Erik-Lân Do Dinh and Iryna Gurevych},
title = {Still not there? Comparing Traditional Sequence-to-Sequence Models to
Encoder-Decoder Neural Networks on Monotone String Translation Tasks},
month = dec,
year = {2016},
booktitle = {Proceedings of the 26th International Conference on Computational
Linguistics (COLING)},
pages = {(1703--1714)},
location = {Osaka, Japan},
language = {English},
}
Abstract: We analyze the performance of encoder-decoder neural models and compare them with well-known established methods. The latter represent different classes of traditional approaches that are applied to the monotone sequence-to-sequence tasks OCR post-correction, spelling correction, grapheme-to-phoneme conversion, and lemmatization. Such tasks are of practical relevance for various higher-level research fields including \textit{digital humanities}, automatic text correction, and speech recognition. We investigate how well generic deep-learning approaches adapt to these tasks, and how they perform in comparison with established and more specialized methods, including our own adaptation of pruned CRFs.
Contact persons:
http://www.ukp.tu-darmstadt.de/
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
src
-- this folder contains the code and detailed instructionssrc/data/
-- sample data from the Twitter typo corpusSee src/README.md for details!
See src/README.md for details!