NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data
https://knb.ecoinformatics.org/software/metacat
GNU General Public License v2.0
27 stars 13 forks source link

Metacat-index does not handle <references> #926

Open mbjones opened 6 years ago

mbjones commented 6 years ago

Author Name: ben leinfelder (ben leinfelder) Original Redmine Issue: 6040, https://projects.ecoinformatics.org/ecoinfo/issues/6040 Original Date: 2013-07-25 Original Assignee: Jing Tao


I indexed a document from EVOS that uses a reference for a creator rather than the details of the person:

<creator><references>1359152217358</references></creator>

But in the index it shows up as "||" instead of following the reference back the the id where it was declared:

<associatedParty id="1359152217358">...

http://evos.nceas.ucsb.edu/evos/metacat/df35c.9.14/default

mbjones commented 6 years ago

Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2013-07-26T00:12:44Z


Here is a bit of the bean definition used by indexing to pick out the content from EML

<bean id="eml.origin" class="org.dataone.cn.indexer.parser.CommonRootSolrField"
        p:multivalue="true"
        p:root-ref="originRoot">
        <constructor-arg name="name" value="origin" />
    </bean>

    <bean id="originRoot" class="org.dataone.cn.indexer.parser.utility.RootElement"
        p:name="origin"
        p:xPath="//dataset/creator" 
        p:template="[individualName]||[organizationName]">
        <property name="leafs"><list><ref bean="organizationNameLeaf"/></list></property>
        <property name="subRoots"><list><ref bean="individualNameRoot" /></list></property>
    </bean>
mbjones commented 6 years ago

Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2013-10-03T18:07:41Z


Apparently this is fixed in cn-index-processor v1.2.0 -- so we will need to pull in this newer dependency in metacat-index and adjust the code accordingly.

mbjones commented 6 years ago

Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2013-10-03T19:01:18Z


This is included in the 1.2.0 d1 index release. It will not include || but instead will use blanks. Not a very great "solution" but better.

mbjones commented 6 years ago

Original Redmine Comment Author Name: Matt Jones (Matt Jones) Original Date: 2013-10-03T19:17:13Z


Spaces aren't really sufficient as a solution, and there are a lot of references fields in EML. We probably need to contribute a fix for this if Skye is not going to fix it for DataONE.

mbjones commented 6 years ago

Original Redmine Comment Author Name: Jing Tao (Jing Tao) Original Date: 2013-10-10T16:36:15Z


Skye said that the sax parser is used to parse those information. This change may require to use DOM parser. It is a big change.

mbjones commented 6 years ago

Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2013-10-10T16:39:54Z


Even with a SAX parser, the implementation could keep track of all elements with "id" attributes and anytime a "references" element is encountered, substitute with that node. The tricky part would be when we encounter a references element before the actual element that declares the id -- would have to track the references that are unfulfilled and fill them in when we actually get to the id elements.

amoeba commented 3 years ago

I put a twin issue for this over on https://github.com/DataONEorg/d1_cn_index_processor/issues/14 too to track changes across the repos.