Closed jbrown-xentity closed 1 year ago
This mdTranslator translates the mdJson input into one or more established metadata standards:
reader(from): fgdc, mdJson, sbJson
writer(to): fgdc, html, iso19110, iso19115_1, iso19115_2
we need to add following modules for our source data format and dcat-us output:
reader: iso19115, arcgis
writer: dcat-us
Here is the example to add the write to translate from FGDC XML to DCAT-US(JSON) (title, description only):
require 'jbuilder'
require 'rubygems'
require_relative 'dcatusJson_dataset'
module ADIWG
module Mdtranslator
module Writers
module DcatusJson
module DcatusJson
def self.build(intObj, hResponseObj)
Jbuilder.new do |json|
json.conformsTo 'https://project-open-data.cio.gov/v1.1/schema'
json.type 'dcat:Catalog'
json.dataset Dataset.build(intObj[:metadata])
end
end # build
end # DcatusJson
end
end
end
end
dcatusJson_dataset.rb:
require 'jbuilder'
require_relative 'dcatusJson_resourceInfo'
module ADIWG
module Mdtranslator
module Writers
module DcatusJson
module Dataset
@Namespace = ADIWG::Mdtranslator::Writers::DcatusJson
def self.build(hMetadata)
resourceInfo = hMetadata[:resourceInfo]
hCitation = resourceInfo[:citation]
Jbuilder.new do |json|
json.title hCitation[:title]
json.description resourceInfo[:abstract]
end
end # build
end # Dataset
end
end
end
end
Below is a sample Python code which could be used to translate FGDC XML files into DCAT-US(JSON) format.
import xmltodict
import json
with open('input.xml') as xml_file:
data = xml_file.read()
fgdc_dict = xmltodict.parse(data)
dcat_us_dict = {
'conformsTo': 'https://project-open-data.cio.gov/v1.1/schema',
'@type': 'dcat:Catalog',
'dataset': {
'title': fgdc_dict['metadata']['idinfo']['citation']['citeinfo']['title'],
'description': fgdc_dict['metadata']['idinfo']['descript']['abstract'],
}
}
with open('dcatus_output.json', 'w') as json_file:
json.dump(dcat_us_dict, json_file, indent=4)
This tool includes several features that are not necessary for our specific translation needs. Our goal is to extract the relevant information from input files and generate a DCAT-US format.
And this is only one step in the overall ETL process. It is important to consider additional steps such as validation and remote file handling etc, so implementing a complete process that covers all necessary parts will increase efficiency and make maintenance easier.
By following the example from cf-hello-worlds, deployed a simple ruby app to cloud.gov sandbox.
Created GSA fork to save the testing code: https://github.com/GSA/mdTranslator Also pushed a ruby app in cloud.gov sandbox : https://metadata-translator-test-unexpected-squirrel-lo.app.cloud.gov/
Looks great, thanks @Jin-Sun-tts !!
Purpose
We want to see if mdTranslator is a useful tool to build off of.
Given above uncertainty, conducting testing is needed to provide factual knowledge on future steps.
3 days of effort has been allocated and once compete, findings will be demonstrated and specific future actions will be decided.
Acceptance Criteria
[ACs should be clearly demo-able/verifiable whenever possible. Try specifying them using BDD.]
Background
https://docs.google.com/document/d/1XzfTrPxu-asJ_55GoeZ2UOJsie9CuCegStS28BAL_40/edit#heading=h.pallknm1j7lu https://github.com/adiwg/mdTranslator https://mdtools.adiwg.org/
Sketch
The developer should be able to fully setup the local dev environment for https://github.com/adiwg/mdTranslator. Ideally as time permits, a new output format should be created such that it can export a DCAT-US (JSON) object with a title and description. If there is still time, we could explore deploying in cloud.gov environment as a new app utilizing cloud.gov Ruby Hello World examples