Closed matyaskopp closed 1 year ago
Well spotted. I do in fact change XML entities to chars in the vertical files, but forgot we can have not only "
but also character entities. Fixed now in 85ed583, as well as the corpus on the concordancers.
But, of course, I found other bugs now, sigh...
@AnnaParla reported this for UA corpus:
Фракція політичної партії Всеукраїнське об'єднання "Батьківщина"
(and many other parliamentary groups) https://www.clarin.si/ske-beta/#text-type-analysis?corpname=parlamint30_ua&wlminfreq=1&wlicase=1&include_nonwords=1&showresults=1&wlnums=frq&wlattr=speech.speaker_party_name I also saw it in PT corpus:Grupo Parlamentar do Partido Ecologista "Os Verdes"
https://www.clarin.si/ske-beta/#text-type-analysis?corpname=parlamint30_pt&wlminfreq=1&wlicase=1&include_nonwords=1&showresults=1&wlnums=frq&wlattr=speech.speaker_party_name@TomazErjavec, I remember we were discussing this maybe two years back. I don't remember if there was anything you could do... Perhaps a new nosketch solved this???
Other possible solutions: We can recommend avoiding
"
character or replacing it with a different one in conversion to vert