DASISH / md-mapping

DASISH Task 5.6. Semantic mapping to convert from community specific metadata schemata into the internal schema of the Joint Metadata Domain
GNU General Public License v3.0
0 stars 0 forks source link

Default mapping for title in cmdi-mapping have strange values #8

Open borsna opened 9 years ago

borsna commented 9 years ago

For the title field we get strange values from CMDI We get the title "mar24_09" instead of "Swedish Goteborg Corpus" for: http://ckan.dasish.eu/ckan/dataset/68de715e04f6ac2a4faf8d7e5a017174b4ea096d368a61976004a031239113e5

I The first XPath evaluated gives us a undesired field for the title: https://github.com/DASISH/md-mapping/blob/master/mapfiles/cmdi.xml#L48-L53

The generated XPath seems realy complicated, could we just use //cmd:CMD/cmd:Components/cmd:Session/cmd:Title/text() as a default?

menzowindhouwer commented 9 years ago

The XPath generated is equivalent to the concept to XPath mapping done by the CLARIN VLO importer. The outcome for this dataset is thus the same as it is in the CLARIN VLO: http://catalog.clarin.eu/vlo/record?2&docId=https_58__47__47_corpus1.mpi.nl_47_media-archive_47_mirrored_corpora_47_childes_47_Swedish_47_Metadata_47_mar24_09.imdi&q=mar24_09&index=2&count=3

menzowindhouwer commented 9 years ago

To have a DASISH default we could extend the mapping template with a placeholder so, we can better determine the order of the XPaths generated:

  <field name="title" cmd:facetConcepts="name">
    <xpath>//cmd:CMD/cmd:Components/cmd:Session/cmd:Title/text()</xpath>
    <xpath>//cmd:CMD/cmd:Components/cmd:Session/cmd:Name/text()</xpath>
    <xpath>//cmd:CMD/cmd:Components/cmd:imdi-corpus/cmd:Corpus/cmd:Title/text()</xpath>
    <xpath>//cmd:CMD/cmd:Components/cmd:imdi-corpus/cmd:Corpus/cmd:Name/text()</xpath>
    <cmd:xpaths/> <!-- << NEW -->
  </field>
borsna commented 9 years ago

I see, but would it not be more logical to use the title-filed if its avalible?

<Name>mar24_09</Name>
<Title>"Swedish Goteborg Corpus"</Title>
<Date>1984-09-23</Date>
<descriptions>
<Description LanguageId="">
longitudinal study of two monolingual Swedish children
</Description>
</descriptions>

soruce: http://ckan.dasish.eu/work/01-harvested/clarin/results/cmdi/The_Language_Archive_s_IMDI_portal/0081/oai_www_mpi_nl_MPI1305330.xml

CKAN is at the moment full of strange titles witch seems to come from the name-field

menzowindhouwer commented 9 years ago

Moved @cmd:facetConcepts to <cmd:facet/>. This element will be replaced by the s found, thus allowing better control over the order of xpaths.

<field name="title">
    <xpath>//cmd:CMD/cmd:Components/cmd:Session/cmd:Title/text()</xpath>
    <xpath>//cmd:CMD/cmd:Components/cmd:Session/cmd:Name/text()</xpath>
    <xpath>//cmd:CMD/cmd:Components/cmd:imdi-corpus/cmd:Corpus/cmd:Title/text()</xpath>
    <xpath>//cmd:CMD/cmd:Components/cmd:imdi-corpus/cmd:Corpus/cmd:Name/text()</xpath>
    <cmd:facet>name</cmd:facet> <!-- << NEW -->
</field>```