SvenElyes / Textanalytics

MIT License
1 stars 0 forks source link

General - Preprocessing - Divide Text into Sections #6

Closed d3ithwen closed 3 years ago

d3ithwen commented 3 years ago

For the Creation of Character Association Networks (CAN) the text should be divided into different sections. This may be done on more than one level. (1) low Level (e.g. Chapters) - since single chapters mark coherent parts of the story / "scenes" (2) higher level (e.g. groups of 10 chapters) (3) concluding level - add the CANs of chapter groups (1-10, 1-20, ..., 1 - last chapter)

d3ithwen commented 3 years ago

Eventually tokenize the Text (as in https://ieeexplore.ieee.org/document/8554714)

fmunzlin commented 3 years ago

The data may already be in an appropriate format - e.g.:

<Book id='10'>

    <Chapter id='1'>
        <Verse id='1'>"Now king David was old and stricken in years; and they covered him[...]"</Verse>
                    ...
        <Verse id='53'>"So king Solomon sent, and they brought him down from the altar. And[...]"</Verse>
    </Chapter>
    <Chapter id='2'>

</book>
SvenElyes commented 3 years ago

https://github.com/godlytalias/Bible-Database/tree/master/English

fmunzlin commented 3 years ago

issue is done by creating the dataloader