charlie-map / wiki-suggestor-service

A C backend that makes suggestions for the Wikiread extension
0 stars 0 forks source link

Document with <img> tag and strange behavior #10

Closed charlie-map closed 2 years ago

charlie-map commented 2 years ago

For some <img> tags, there seems to be some undefined behavior. For example:

<img
    src="https://wikimedia.org/api/rest_v1/media/math/render/svg/fb9fc371e46e02d0ef51e781e7397629425856b5"
    class="mwe-math-fallback-image-inline"
    aria-hidden="true"
    style="vertical-align: -0.838ex; width:22.631ex; height:2.843ex;"
    alt="{\displaystyle \mathbf {A} \cdot \mathbf {B} =\left\|\mathbf {A} \right\|\left\|\mathbf {B} \right\|\cos \theta }"
>

This noticeably doesn't have a close /, so the current tokenization scheme will likely not work on this (resulting in erroneous data within results). There should likely be some fix that ensures that the tokenization stops for a token when there is a close bracket.

charlie-map commented 2 years ago

All strange HTML parsing behavior has been taken care of in recent debugs in Yomu.