dkt-projekt / e-SMT

Web service for the Moses Statistical Machine Translation
1 stars 0 forks source link

Improve MT output by using recognized entities #4

Open PeterBourgonje opened 8 years ago

PeterBourgonje commented 8 years ago

By using the recognized entities and bypassing them in the translation process (by getting the correct name from external sources, like dbpedia for example), we can probably improve the quality of the output (as discussed several times already in internal meetings). I think a good starting point for this would be to get all entities from a NIF document. There are plenty of methods for this in the DKTCommon NIFReader class. e.g. public static List<String[]> extractEntityIndices(Model nifModel) and public static List<String[]> extractEntities(Model nifModel)would be good to start with I think.