Open Fernando-Melo opened 7 years ago
I'm pretty sure the problem is that those results were indexed with encoding problems, so there is nothing we can do in production to fix it.
Problem with the extracted text, there is nothing we can do in Production. Will re-tag the issue so we can test the new indexing against this collection.
"snippet": "<em>Usou</em> <em>filho</em> <em>como</em> <em>testa</em>-<em>de</em>-<em>ferro</em> - Sociedade - Sol sexta-feira, 18 <em>de</em> Novembro <em>de</em> 2011, 16:03<span class=\"ellipsis\"> ... </span> <em>Usou</em> <em>filho</em> <em>como</em> <em>testa</em>-<em>de</em>-<em>ferro</em> 18�<em>de</em>�Novembro,�2011 por Fel�cia Cabrita Duarte Lima <em>usou</em> o <em>filho</em>, Pedro<span class=\"ellipsis\"> ... </span> recente contrata��o do... � � Sociedade <em>Usou</em> <em>filho</em> <em>como</em> <em>testa</em>-<em>de</em>-<em>ferro</em> � � Tecnologia Smartphones<span class=\"ellipsis\"> ... </span>",
Related https://github.com/arquivo/pwa-technologies/issues/1059 Test if this special cases can be fixed in the full text information extraction process.
reevaluate after solr implementation for textsearch
If you try the following full-text search query the first results will present encoding issues:
http://arquivo.pt/search.jsp?l=pt&query=usou+filho+como+testa+de+ferro&btnSubmit=Pesquisar&dateStart=01%2F01%2F1996&dateEnd=31%2F12%2F2015