andgineer / lexiflux

AI-powered foreign text reader for language learners (Django)
1 stars 0 forks source link

If HTML tags are inside word that break HTML #61

Closed andgineer closed 2 months ago

andgineer commented 2 months ago

For example

    &#x27;<br/>razgovora?

cleaned from HTML text is

    'razgovora?

that detected as single word and it try to include <br/> into it but calculate word end wrongly in fact it should not include the tag inside word at all

it should break a word with html-tags inside into a number of words between this tags

and if in some of them only spaces and non-word chars it should just exclude them