google-code-export / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Integrate Penn Discourse Treebank (PDTB) parser #585

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
PDTB parser is available under GPL here:

http://wing.comp.nus.edu.sg/~linzihen/parser/index.html

It's written in Ruby, but can be run in JRuby, thus a seamless integration in 
DKPro should be expected.

Integration could be done in two steps:

1.) Type system for PDTB in ASL (e.g., 
de.tudarmstadt.ukp.dkpro.core.api.discourse-asl)

2.) This implementation in GPL (e.g., 
de.tudarmstadt.ukp.dkpro.discourse.pdtbparser)

Original issue reported on code.google.com by ivan.hab...@gmail.com on 28 Jan 2015 at 11:09

GoogleCodeExporter commented 9 years ago

Original comment by ivan.hab...@gmail.com on 28 Jan 2015 at 11:10

GoogleCodeExporter commented 9 years ago
There is one big obstacle here: some preprocessing tool in the parser 
normalizes words to BrE. So mapping the output back to the input (different 
tokenization, some words changed) is quite an effort (or we might have to hack 
the Ruby code).

Original comment by ivan.hab...@gmail.com on 16 Feb 2015 at 7:26

GoogleCodeExporter commented 9 years ago
You can enter your observations here: 
https://code.google.com/p/dkpro-core-asl/wiki/UnintegratableSoftware

That doesn't necessarily mean that the software is *actually* not integrated. 
Maybe we find a workaround or decide that maintaining a patch/fork is 
acceptable, e.g. because the code is no longer maintainer upstream.

Original comment by richard.eckart on 16 Feb 2015 at 7:45