NCEAS / open-science-codefest

Web site and planning materials for open science conference.
http://nceas.github.io/open-science-codefest
12 stars 10 forks source link

Develop a Common SOLR Index Schema for Cataloging Science Metadata #8

Open mbjones opened 9 years ago

mbjones commented 9 years ago

Organizational Page: SOLRMeta Category: Data science Title: Develop a Common SOLR Index Schema for Cataloging Science Metadata Proposed by: Dave Vieglais Participants: Summary: SOLR is an open source, scalable, high performance search engine that can be used for searching broad categories of information. The goal of this session is to develop a SOLR Schema that enables effective search against common scientific metadata formats, with emphasis on the earth sciences. Such a common schema could be leveraged by many data repositories to provide consistent discovery semantics against their repositories while not precluding more specialized capabilities appropriate for specific repositories. Technologies: XML, SOLR, Lucene, EML, FGDC, ISO19115, Dublin Core, Darwin Core

sckott commented 9 years ago

Cool, any reason why not to do an Elasticsearch schema too?

mbjones commented 9 years ago

Probably not -- seems like they could share most of the fields anyways, and the main difference would be in representation and syntax. What do you think @vdave ?

datadavev commented 9 years ago

No reason at all. It would be great to come up with a technology agnostic set of fields that could augment existing simple cores such as Dublin Core and Darwin Core with properties generally useful to earth sciences. The "Earth Core"?

sckott commented 9 years ago

Okay, cool

lewismc commented 9 years ago

@mbjones I am actively developing Apache OODT [0] which we (The Jet Propulsion Laboratory) use for cataloging scientific data including metadata. We've got a fairly good idea of how and what we wisht o catalog, however there is always room for potential improvement. I would hterefore be interested in contributing towards this workshop.

[0] http://oodt.apache.org

mbjones commented 9 years ago

@lewismc Glad to hear you are interested. It would be great to discuss the overlaps between the OODT approach to cataloging metadata and the DataONE metadata index which @vdave and I contributed to and that inspired this activity. I think there's a lot of room for this kind of cross-standard agreement on metadata terms. Looking forward to it at the Codefest.

chrismattmann commented 9 years ago

Yep, agreed, i I think we could probably have a lot to look at in terms of the overlaps. We've already already done through OODT a Solr science data schema via the OODT File Manager component, so we could draw from that.

datadavev commented 9 years ago

Initial work on this topic is in the google sheet:

https://docs.google.com/spreadsheets/d/1VnEF0oezHlP2U98mmbZR9iNSSJjIMsgambTMOinnvSA/edit#gid=0

Also some notes in etherpad at:

https://etherpad.mozilla.org/20140903-solr-codefest