Open madryk opened 3 years ago
original: harvested:
original: harvested:
-- citation metadata --
--- <stdyDscr><citation><titlStmt> ---
title -> <titl> -> title
subtitle -> <subTitl> -> subtitle
alternativeTitle -> <altTitl> -> alternativeTitle
<stdyDscr><citation><rspStmt>
author:authorName -> <AuthEnty> -> author:authorName
author:authorAffiliation -> <AuthEnty affiliation> -> author:authorAffiliation
nothing -> nothing -> datasetContact:datasetContactEmail == N/A
--- <stdyDscr><stdyInfo> ---
dsDescription:dsDescriptionValue -> <abstract> -> dsDescription:dsDescriptionValue
subject -> <subject><keyword> -> keyword:keywordValue
keyword:keywordValue -> <subject><keyword> -> keyword:keywordValue
keyword:keywordVocab -> <subject><keyword vocab> -> keyword:keywordVocabulary
topicClassification:topicClassValue -> <subject><topcClas> -> topicClassification:topicClassValue
topicClassification:topicClassVocab -> <subject><topcClas vocab> -> topicClassification:topicClassVocab
notesText -> <notes> -> notesText
--- <stdyDscr><citation><prodStmt> ---
producer:producerAbbreviation -> <producer abbr> -> producer:producerAbbreviation
producer:producerAffiliation -> <producer affiliation> -> producer:producerAffiliation
producer:producerURL -> <producer URI> -> producer:producerURL
producer:producerName -> <producer> -> producer:producerName
producer:producerLogoURL -> <producer role> -> producer:producerLogoURL
productionDate -> <prodDate> -> productionDate
productionPlace -> <prodPlac> -> productionPlace
grantNumber:grantNumberAgency -> <grantNo agency> -> grantNumber:grantNumberAgency
grantNumber:grantNumberValue -> <grantNo> -> grantNumber:grantNumberValue
--- <stdyDscr><citation><distStmt> ---
RepOD (name of root dv) -> <distrbtr> -> distributor:distributorName
system publication date -> <distDate> -> distributionDate
<stdyDscr><stdyInfo><sumDscr>
timePeriodCovered:timePeriodCoveredStart -> * - <timePrd cycle=”P1” event=”start”> -> timePeriodCovered:timePeriodCoveredStart
timePeriodCovered:timePeriodCoveredEnd -> * - <timePrd cycle=”P1” event=”end”> -> timePeriodCovered:timePeriodCoveredEnd
dateOfCollection:dateOfCollectionStart -> * - <collDate cycle=”P1” event=”start”> -> dateOfCollection:dateOfCollectionStart
dateOfCollection:dateOfCollectionEnd -> * - <collDate cycle=”P1” event=”end”> -> dateOfCollection:dateOfCollectionEnd
-- geospatial metadata --
---<stdyDscr><stdyInfo><sumDscr> ---
** - mess -> -> geographicCoverage:country
** - mess -> -> geographicCoverage:otherGeographicCoverage
geographicUnit -> <geogUnit> -> geographicUnit
social science metadata | <stdyDscr><stdyInfo><sumDscr>
universe -> <universe> -> universe
--- <stdyDscr><method><dataColl> ---
dataCollector -> <dataCollector> -> dataCollector
collectorTraining -> <collectorTraining> -> collectorTraining
frequencyOfDataCollection -> <frequence> -> frequencyOfDataCollection
deviationsFromSampleDesign -> <deviat> -> deviationsFromSampleDesign
dataCollectionSituation -> <callSitu> -> dataCollectionSituation
actionsToMinimizeLoss -> <actMin> -> actionsToMinimizeLoss
controlOperations -> <conOps> -> controlOperations
cleaningOperations -> <cleanOps> -> cleaningOperations
--- <stdyDscr><method><anlyInfo> ---
responseRate -> <respRate> -> responseRate
otherDataAppraisal -> <dataAppr> -> otherDataAppraisal
--- <stdyDscr><method> ---
socialScienceNotes:socialScienceNotesType -> <notes type> -> socialScienceNotes:socialScienceNotesType
socialScienceNotes:socialScienceNotesSubject -> <notes subject> -> socialScienceNotes:socialScienceNotesSubject
socialScienceNotes:socialScienceNotesText -> <notes> -> socialScienceNotes:socialScienceNotesText
*
- value or date
attribute
**
- values are mixed up
in repod we have:
geographicCoverage: [{
country: "Algeria",
state: "Stan w Algierii?",
city: "Nie znam"
otherGeographicCoverage: "Elo inne"
}, {
country: "Poland",
state: "Woj. Mazowieckie",
city: "Warszawa"
otherGeographicCoverage: "Nie warszawa"
}]
is translated to:
geographicCoverage: [{
otherGeographicCoverage: "Nie znam; Stan w Algierii?; Elo inne"
}, {
country: "Algeria",
otherGeographicCoverage: "Woj. Mazowieckie"
}, {
country: "Poland",
otherGeographicCoverage: "Warszawa; Nie warszawa"
}]
If any of the following metadata will be filled then dataset will NOT be harvested:
cell counting
controlled vocabulary value https://repod-test.icm.edu.pl/oai?verb=GetRecord&metadataPrefix=dataverse_json&identifier=doi:10.18150/FK2/AJ4VYO https://repod-test.icm.edu.pl/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.18150/FK2/AJ4VYO
original: harvested:
date seems to be a date when last datasetVersion was released.
This is slighly different from what will be presented in original repository. In original we will show date of last MAJOR datasetVersion release. For example:
V1.0
V1.1
V2.0 - this date
V2.1
year in citation is taken from metadata field - Distribution Date
part Jestem, Dystrybutorem
is taken from metadata field - Distributor - Name
original: harvested:
date seems to be a date when file was harvested. md5 and file size info are harvested, but they are not showed.
All metadata will be harvested as is in original dataset, except:
Ad. DC on Harvard: https://demo.dataverse.org/api/datasets/export?exporter=dcterms&persistentId=dc_doi%3A10.70122/FK2/YEX6EX
They seem to use dcterms:relation for related datasets and dcterms:isReferencedBy for related publication (in our case only some related publications would count as referencing).
The do not seem to make any use of related materials.
Also terms and rights need some changes on repod side.
My proposal of how we can inject terms into DDI:
If the dataset IS NOT under embargo
AND all files in the dataset are on the same licence or terms, but none of them is Restricted Access.
<dataAccs>
<notes type="DVN:TOU" level="dv">[Universal License Name]</notes>
<notes type="DVN:TOA" level="dv"></notes>
<setAvail/>
<useStmt/>
</dataAccs>
AND files in the dataset are on different licences, but none of them is restricted access:
<dataAccs>
<notes type="DVN:TOU" level="dv">Different licenses or terms for individual files.</notes>
<notes type="DVN:TOA" level="dv"></notes>
<setAvail/>
<useStmt/>
</dataAccs>
AND files in the dataset are on different licences AND at least one of them is Restricted Access
<dataAccs>
<notes type="DVN:TOU" level="dv">Different licenses or terms for individual files.</notes>
<notes type="DVN:TOA" level="dv">Access to some files in this dataset is restricted.</notes>
<setAvail/>
<useStmt/>
</dataAccs>
AND all files in the dataset are restricted access AND all of them have the same subterms:
<dataAccs>
<notes type="DVN:TOU" level="dv"></notes>
<notes type="DVN:TOA" level="dv">Access to all files in this dataset is restricted. [Subterms text, for instance: For academic purposes only, not for redistribution]. </notes>
<setAvail/>
<useStmt/>
</dataAccs>
AND all files in the dataset are restricted access AND they have different subterms:
<dataAccs>
<notes type="DVN:TOU" level="dv"></notes>
<notes type="DVN:TOA" level="dv">Access to all files in this dataset is restricted. Different terms for individual files. </notes>
<setAvail/>
<useStmt/>
</dataAccs>
The dataset IS under embargo:
<dataAccs>
<notes type="DVN:TOU" level="dv"></notes>
<notes type="DVN:TOA" level="dv">Access to all files in this dataset is embargoed. </notes>
<setAvail>
Files in this dataset will be available from [Embargo date YYYY-MM-DD].
</setAvail>
<useStmt/>
</dataAccs>
Repod -> Harvard dataverse
Format DublinCore:
relatedDatasets
is filled (we will fail in such cases)Example of dataset will filled all metadata:
https://repod-test.icm.edu.pl/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=doi:10.18150/FK2/AJ4VYO