aminorex / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

French chunker model not working #526

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
There are actually two problems.

The first problem is, that the POS tagger model for French is not compatible 
with the chunker model. The POS tagger emits an "SENT" tag for the full stop at 
the end of a sentence, but the chunker expects a "PONCT:S".

The second problem is, that the default "flush sequence" does not work with the 
French chunker model. This sequence is emitted by DKPro Core/TT4J at the end of 
a document to force tree-tagger to write all of its results to stdout. 

Without looking at your approach, I assume that it works because it forcibly 
terminate the chunker process at the end of a document - this also forces the 
tagger to output its results. However, we want to avoid that and keep the 
process running in the background to avoid a performance drop due to the 
overhead of starting the process and loading the model.

I am implementing these solutions:

1) TT4J is extended to allow a custom flush sequence.
2) The chunker model metadata for French is extended to provide a different 
flush sequence: 

  Ce-PRO:DEM
  est-VER:pres
  la-DET:ART
  fin-NOM
  mon-DET:POS
  ami-NOM
  .-PONCT:S

3) The DKPro Core TreeTaggerChunker component is changed to allow forcibly 
mapping tags before passing them on to the chunker.
4) The chunker model metadata for French is extended to map SENT to PONCT:S

Original issue reported on code.google.com by richard.eckart on 19 Nov 2014 at 3:06

GoogleCodeExporter commented 9 years ago
Original thread on users list: 
https://groups.google.com/d/topic/dkpro-core-user/joTO9hP8Wuo/discussion

Original comment by richard.eckart on 19 Nov 2014 at 3:07

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r3070.

Original comment by richard.eckart on 19 Nov 2014 at 6:15

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r3071.

Original comment by richard.eckart on 19 Nov 2014 at 7:00

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 19 Nov 2014 at 7:03