ESIPFed / gsoc

Project ideas and mentor guidance for ESIP members to participate in Google Summer of Code.
Apache License 2.0
34 stars 16 forks source link

Data Management Training Clearinghouse Analytics Enhancement - Google Summer of Code Project #21

Closed karlbenedict closed 5 years ago

karlbenedict commented 5 years ago

Data Management Training Clearinghouse Analytics Enhancement - Google Summer of Code Project

ESIP Member Organization Name:

University of New Mexico - University Libraries

Mentors:

Karl Benedict, Jon Wheeler

Information for Students:

General information for students can be found here. Additional information about the ESIP Data Management Training Clearinghouse can be obtained by contacting \@karlbenedict, and additional information about the RAMP platform can be obtained by contacting \@jonathanwheeler01.

Project Ideas:

Idea Title:

Data Management Training Clearinghouse Analytics Enhancement

Abstract:

The ESIP-hosted Data Management Training Clearinghouse (DMTC) is a registry of information about educational training resources focused on science data management and other data science capacity building skills. The registry is a web portal with a Drupal backend content management system (CMS). For purposes of long-term sustainability, DMTC supporters see the need for tracking usage metrics on both the functionality and the users of the DMTC that would expand upon and extend the Google Analytics options that are included in the Drupal CMS. Over the course of two Institute of Museum and Library Services funded projects, the RAMP platform has been developed to provide an index of Google Search data for repositories registered with RAMP. Recent updates to the RAMP platform include the exposure of a public, read-only API that participating repositories can use to integrate customized search performance metrics into local analytic services. The integration of RAMP-provided search statististics along with the detailed metadata within the DMTC will enable fine-grained examination of the relationship between search results captured by Google, and the characteristics of the Clearinghouse resources exposed through Google search. This project will provide the DMTC with a local analytic dashboard that will allow DMTC developers and contributors to gain an increased understanding of the dynamics of search and access for the resources registered within the clearinghouse.

Technical Details

The project will require an extraction of data from multiple sources, including the data captured and presented through the Google Analytics Search Console and accessed through the RAMP Elasticsearch API, and the Drupal API and the underlying SOLR database for the DMTC. These data will then need to be integrated in support of the development of analytic tools (to be written in Javascript) within the DMTC web interface. The functional requirements for these analytic tools will be developed through consultation with selected DMTC stakeholders (including members of the IMLS-funded DMTC Enhancement Project Leadership and Advisory Board, and DMTC content contributors), and the resulting tools will be reviewed and iterated based on the feedback of the same stakeholders.

Helpful Experience

Google Analytics and Search Console, Drupal version 7, Elasticsearch, Javascript data visualization and analysis frameworks, RAMP API (see below for a link to the high-level RAMP API documentation).

First Steps

Review the high-level RAMP documentation available here.

Become familiar with the ESIP Data Management Training Clearinghouse (at [http://dmtclearinghouse.esipfed.org(http://dmtclearinghouse.esipfed.org)) to gain experience with:

Start to give thought to opportunities to integrate search result information provided by Google's Search Console (via RAMP's index) with the metadata in the Clearinghouse - contributing to a collection of analytic tools (perhaps as part of a dashboard) that better characterize discovery and access patterns for content with in the Clearinghouse.