DataONEorg / d1_cn_index_processor

The CN index processor component
0 stars 1 forks source link

Json-ld subprocess can't process legitimate schema.org objects #34

Closed taojing2002 closed 2 years ago

taojing2002 commented 2 years ago

Our indexer couldn't process the content of the schema.org objects from the Hakai IYS Catalog member node. The error message is:

cn-index-processor-daemon.log.7:[ WARN] 2022-05-05 18:14:15,377 (SolrIndexService:processObject:241) The subprocessor org.dataone.cn.indexer.parser.JsonLdSubprocessor can't process the id sha256:7bf3f2000c610da060004d517032b45d1681b1c88bdbf60ecc290649ceb1d203 since The Processor cannot find the either prefix of https://schema.org/ or http://schema.org/ in the expanded json-ld object.. However, the index still can be achieved without this part of information provided by the processor.

taojing2002 commented 2 years ago

It turns out the method to determine the namespace of shcema.org starting with https or http only looks the keys in the first level map in the content. However, the keys with the https://schema.org in the map in the objects from the Hakai IYS Catalog member node are one level deeper. So the method couldn't find it. I switched to use the recursive algorithm to fix the issue.

amoeba commented 2 years ago

Relevant commits here are https://github.com/DataONEorg/d1_cn_index_processor/commit/d69498c5fd933f3ea7d031c2add8910f744503ca and https://github.com/DataONEorg/d1_cn_index_processor/commit/f9528cb51f0f379da93580c5b04c9b0c594ac363.