Open borsna opened 9 years ago
The XPath generated is equivalent to the concept to XPath mapping done by the CLARIN VLO importer. The outcome for this dataset is thus the same as it is in the CLARIN VLO: http://catalog.clarin.eu/vlo/record?2&docId=https_58__47__47_corpus1.mpi.nl_47_media-archive_47_mirrored_corpora_47_childes_47_Swedish_47_Metadata_47_mar24_09.imdi&q=mar24_09&index=2&count=3
To have a DASISH default we could extend the mapping template with a placeholder so, we can better determine the order of the XPaths generated:
<field name="title" cmd:facetConcepts="name">
<xpath>//cmd:CMD/cmd:Components/cmd:Session/cmd:Title/text()</xpath>
<xpath>//cmd:CMD/cmd:Components/cmd:Session/cmd:Name/text()</xpath>
<xpath>//cmd:CMD/cmd:Components/cmd:imdi-corpus/cmd:Corpus/cmd:Title/text()</xpath>
<xpath>//cmd:CMD/cmd:Components/cmd:imdi-corpus/cmd:Corpus/cmd:Name/text()</xpath>
<cmd:xpaths/> <!-- << NEW -->
</field>
I see, but would it not be more logical to use the title-filed if its avalible?
<Name>mar24_09</Name>
<Title>"Swedish Goteborg Corpus"</Title>
<Date>1984-09-23</Date>
<descriptions>
<Description LanguageId="">
longitudinal study of two monolingual Swedish children
</Description>
</descriptions>
CKAN is at the moment full of strange titles witch seems to come from the name-field
Moved @cmd:facetConcepts
to <cmd:facet/>
. This element will be replaced by the
<field name="title">
<xpath>//cmd:CMD/cmd:Components/cmd:Session/cmd:Title/text()</xpath>
<xpath>//cmd:CMD/cmd:Components/cmd:Session/cmd:Name/text()</xpath>
<xpath>//cmd:CMD/cmd:Components/cmd:imdi-corpus/cmd:Corpus/cmd:Title/text()</xpath>
<xpath>//cmd:CMD/cmd:Components/cmd:imdi-corpus/cmd:Corpus/cmd:Name/text()</xpath>
<cmd:facet>name</cmd:facet> <!-- << NEW -->
</field>```
For the title field we get strange values from CMDI We get the title "mar24_09" instead of "Swedish Goteborg Corpus" for: http://ckan.dasish.eu/ckan/dataset/68de715e04f6ac2a4faf8d7e5a017174b4ea096d368a61976004a031239113e5
I The first XPath evaluated gives us a undesired field for the title: https://github.com/DASISH/md-mapping/blob/master/mapfiles/cmdi.xml#L48-L53
The generated XPath seems realy complicated, could we just use
//cmd:CMD/cmd:Components/cmd:Session/cmd:Title/text()
as a default?