codeaudit / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

[io.imscwb] Reader produces the same document ID for all texts from the same file #92

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
While different titles are set on the texts from a source file, the document ID 
is always the same. Thus, e.g. the XmiWriter will just always write all texts 
to the same filename.

Original issue reported on code.google.com by richard.eckart on 14 Aug 2012 at 8:12

GoogleCodeExporter commented 9 years ago
I'm about to add three new parameters:

 * PARAM_GENERATE_NEW_IDS - set the document ID using a global counter running up from 0 until the reader reaches the last text in the last file.
 * PARAM_ID_IS_URL - set DocumentMetaData.uri from the text id attribute in the source file. baseUri is set to null.
 * PARAM_REPLACE_NON_XML - replace non-XML characters with space characters

Original comment by richard.eckart on 14 Aug 2012 at 11:11

GoogleCodeExporter commented 9 years ago
Committed.

Original comment by richard.eckart on 18 Aug 2012 at 4:29