DataONEorg / d1_cn_index_processor

The CN index processor component
0 stars 1 forks source link

schema.org indexing appends type to 'abstract' field #15

Closed gothub closed 3 years ago

gothub commented 3 years ago

For certain documents, the parsing for schema.org documents is not stripping the datatype off of the 'abstract' field. See the "Abstract" at https://search-sandbox.test.dataone.org/view/urn%3Auuid%3A4ad54da7-d5c0-4497-91c7-4c004f8a5be2, which has the string "^^https://schema.org/HTML" appended to the end

The source json-ld document has:

"description": {
    "@type": "HTML",
    "@value":"<p>"Winter ecology of larval kril..."
},

So the SPARQL query that extracts "description" -> "abstract" needs to strip off the type for this field, for example:
            SELECT ( str(?description) as ?abstract )
instead of 
            SELECT ( ?description as ?abstract )


This is being done for some values/queries already, but should probably be done for all values.
gothub commented 3 years ago

Fixed in commit d7555df0fd98d0c32849a0107f0e01033d82fd97