Closed d3ithwen closed 3 years ago
Eventually tokenize the Text (as in https://ieeexplore.ieee.org/document/8554714)
The data may already be in an appropriate format - e.g.:
<Book id='10'>
<Chapter id='1'>
<Verse id='1'>"Now king David was old and stricken in years; and they covered him[...]"</Verse>
...
<Verse id='53'>"So king Solomon sent, and they brought him down from the altar. And[...]"</Verse>
</Chapter>
<Chapter id='2'>
</book>
issue is done by creating the dataloader
For the Creation of Character Association Networks (CAN) the text should be divided into different sections. This may be done on more than one level. (1) low Level (e.g. Chapters) - since single chapters mark coherent parts of the story / "scenes" (2) higher level (e.g. groups of 10 chapters) (3) concluding level - add the CANs of chapter groups (1-10, 1-20, ..., 1 - last chapter)