Open aaccomazzi opened 3 years ago
After checking how we encode data in the SOLR, I believe this is the list of fields for which we need to escape the basic XML entities ("<", ">", and "&"):
The code for this should be as simple as:
def escape( str ):
str = str.replace("&", "&")
str = str.replace("<", "<")
str = str.replace(">", ">")
return str
An issue surfaced with the encoding of the basic
<
XML entity which may have been caused by direct ingest (or a bug in classic ingest). One such example is for the bibcode2020arXiv201000466H
. When properly encoded, the abstract should have the following content:Rather than: