mbjones commented 1 year ago

We will want to distribute our lessons widely, and it will be helpful to have structured metadata for the lessons to be used in search and discovery. The ESIP Data Management Training Clearinghouse has an established metadata vocabulary for courses, and would make a good starting point for our work -- see below for an example of that metadata. But we also should evaluate metadata vocabularies used in other related initiatives, including the Carpentries (see their example format for lessons), the NEON modules, and EarthLab courses.

Here's an example metadata record from the DMTC for a DataONE lesson on metadata:

[
  {
    "title": "DataONE Data Management Module 07: Metadata",
    "status": 1,
    "pub_status": "published",
    "modification_date": "2022-06-21T10:56:57Z",
    "resource_modification_date": "1900-01-01T00:00:00Z",
    "url": "https://dataoneorg.github.io/Education/lessons/07_metadata/index.html",
    "access_cost": 0,
    "submitter_name": "Amber Budden",
    "submitter_email": "aebudden@dataone.unm.edu",
    "authors": [],
    "author_names": [],
    "author_org": {
      "name": "DataONE Community Engagement and Outreach Working Group",
      "name_identifier": "",
      "name_identifier_type": ""
    },
    "contact": {
      "name": "Amber E.  Budden",
      "org": "DataONE",
      "email": ""
    },
    "abstract_data": "What is metadata? Metadata is data (or documentation) that describes and provides context for data and it is everywhere around us. Metadata allows us to understand the details of a dataset, including: where it was collected, how it was collected, what gaps in the data mean, what the units of measurement are, who collected the data, how it should be attributed etc. By creating and providing good descriptive metadata for our own data, we enable others to efficiently discover and use the data products from our research. This lesson explores the importance of metadata to data authors, users of the data and organizations, and highlights the utility of metadata. It provides an overview of the different metadata standards that exist, and the core elements that are consistent across them; guiding users in selecting a metadata standard to work with and introduces the best practices needed for writing a high quality metadata record.&nbsp;<br />\r\nThis 30-40 minute&nbsp;lesson&nbsp;includes&nbsp;a downloadable presentation (PPT or PDF) with supporting hands-on exercise, handout, and supporting data files.<br />\r\n&nbsp;",
    "abstract_format": "filtered_html",
    "subject": "",
    "keywords": [
      "Data lifecycle",
      "Data management",
      "Metadata"
    ],
    "license": "Creative Commons 0 - CC0 \"No Rights Reserved\" (Public Domain)",
    "usage_info": "",
    "citation": "DataONE Community Engagement & Outreach Working Group (2017) \"Metadata Management\". Accessed at https://dataoneorg.github.io/Education/lessons/07_metadata/ on Jun 01, 2018",
    "locator_data": "",
    "locator_type": "",
    "publisher": "DataONE",
    "version": "v1 - 05.01.2012",
    "created": "2016-09-22T14:24:18",
    "published": "2012-05-01T00:00:00Z",
    "accessibility_features": [
      {
        "name": "Transformation - features that allow the content to be changed for ease of access, e.g., by using large print fonts."
      }
    ],
    "accessibility_summary": "",
    "language_primary": "en",
    "languages_secondary": [],
    "ed_frameworks": [
      {
        "name": "DataONE Education Modules",
        "nodes": [
          {
            "description": "",
            "name": "Describe"
          }
        ],
        "type": "framework"
      },
      {
        "name": "FAIR Data Principles",
        "nodes": [],
        "type": "framework"
      }
    ],
    "target_audience": [],
    "purpose": "Instruction - detailed information about aspects or processes related to data management or data skills.",
    "completion_time": "Up to 1 hour",
    "media_type": "Presentation - representation of the particular way in which an author shows, describes or explains one or more concepts, e.g., a set of Powerpoint slides.",
    "lr_type": "Unit -  long-range plan of instruction on a particular concept containing multiple, related lessons.",
    "creator": "sophisticus",
    "md_record_id": "",
    "ratings": [],
    "rating": 0,
    "id": "32335b19-8e6f-3772-aacc-1379d70330bb",
    "contributors": [],
    "contributor_orgs": [
      {
        "name": "DataONE",
        "name_identifier": "N.A.",
        "name_identifier_type": "N.A.",
        "type": "Final product"
      }
    ],
    "score": 4.3308573,
    "country_of_origin": null,
    "credential_status": null,
    "notes": []
  }
]

Each lesson in the DMTC has an identifier, and with that , you can downloaded the lesson metadata from the DMTC API (see DMTC API documentation) using the following command:

curl -s https://dmtclearinghouse.esipfed.org/api/resources/?id=32335b19-8e6f-3772-aacc-1379d70330bb | jq .results

carmengg commented 1 year ago

Here is some information about how the Data Carpentries, NEON lectures, Earth Lab Lessons, and the Data Management Training Clearinghouse (DMTC) add metadata for archiving and organizing their modules.

Data Carpentries

Date Carpentries lessons are organized into episodes.

Each episode’s YAML header must contain: the episode’s title time estimates for teaching and exercises motivating questions lesson objectives a summary of key points

For example, this is the YAML for the Using RMarkdown episode:

---
source: Rmd
title: "Using RMarkdown"
teaching: 10
exercises: 2
questions:
- "How to write a lesson using RMarkdown?"
objectives:
- "Explain how to use RMarkdown with the new lesson template."
- "Demonstrate how to include pieces of code, figures, and challenges."
keypoints:
- "Edit the .Rmd files not the .md files"
- "Run `make serve` to knit documents and preview lesson website locally"
---

NOTES: I really like the teaching and exercises timing information. It could also serve to distinguish between modules where participants mostly work on exercises and modules where there’s more lecturing. However, I would change teaching and exercises to something like teaching-time and exercises-time to make it explicit it is the timing (at first I thought exercises: 2 meant there were 2 exercises).

NEON teaching modules

NEON teaching modules are part of the Quantitative Undergraduate Biology Education and Synthesis (QUBES) lesson repository. According to their website,

The QUBES platform hosts hundreds of teaching materials, reference materials, and cloud-based software free to use and adapt using open Creative Commons licenses.

In the filters tab in the QUBES lesson browser, there are three main tags associated with each lesson. These are the tags with the corresponding subtypes:

Resource Type:
- Teaching material
- Reference material
- Dataset
Audience level:
- High School
- Undergraduate
- Graduate
- Faculty
Activity length:
- Less than 1 hour
- 1 Hour
- More than 1 hour
- Extended Project

Other less used tag categories are:

Software Used
Inclusive Pedagogy for Life Science Education
QUBES Universal Design Tagging Ontology
Open Science and Education Practices Ontology.

The NEON teaching modules use the first three tags and call these “alignments”. Here's an example module. Most modules have further subcategories for each alignment. For example:

Audience Level
- Undergraduate
  - Introductory
    - Non-majors
    - Majors
  - Advanced
Resource Type
- Teaching material
  - Homework
  - Lecture
  - Lab
  - Online course
- Dataset
  - Raw
  - Cleaned

The NEON teaching modules also include a citation with information about author, year of publication, title, version number and doi. Example:

Lesley Bulluck (2019). Testing hypotheses about the role of wildfire in structuring avian communities. NEON Faculty Mentoring Network, (Version 2.0). QUBES Educational Resources. doi:10.25334/R2S1-4S62

NOTES: I really liked the core three alignments: audience level, resource type and activity length. The activity length could complement the other time-tracking metadata about lecturing and exercises. I was surprised the “software used” tag wasn’t as used. This would be good to include for us too. Having software used = None for non-technical modules could also be an option.

Earth Lab Lessons

The Earth Lab only hosts resources developed by Earth Lab at University of Colorado, Boulder. The lessons are organized into the following topics:

LESSONS BY TOPIC:

Remote sensing
Earth science
Social science
Time series
Data exploration and analysis
Spatial data and gis
Reproducible science and programming
Find and manage data
File formats

NOTES: Having a topic tag could be good and these are some possible values for it.

Data Management Training Clearinghouse (DMTC)

According to their website:

Data Management Training Clearinghouse (DMTC) is a registry for excellent online learning resources focusing on data skills and capacity building for research data management, data stewardship and data education.

Learning resources can be browsed using the following filters:

Filters

Keywords
Author Organization(s)
Authoring Person(s) Names
Original Languages
Additional Languages
Target Audiences
Access Cost
License
Accessibility Features
Subject Discipline
Media Type
Educational Purpose
Educational Frameworks
Publication Status

Additionally, these are the tags from the metadata Matt shared:

title
status
pub_status
modification_date
resource_modification_date
url
access_cost
submitter_name
submitter_email
authors
author_names
author_org
contact
abstract_data
keywords
license
usage_info
citation
locator_data
locator_type
publisher
version
created
published
accessibility_features
accessibility_summary
language_primary
languages_secondary
ed_frameworks
target_audience
purpose
completion_time
media_type
lr_type
creator
md_record_id
ratings
rating
id
contributors
contributor_orgs
score
country_of_origin
credential_status
notes

NOTES: Many of the tags in the complete metadata are about submission to the module to DMTC, so these would be needed. Others, like access_cost, license, language_primary, and language_secondary, will probably be the same for all the modules we develop, so we could keep these out for now. Keywords, target_audiece, completion_time, and title are some we could add right away. Once the module becomes stable we could include url, citation and authors.

carmengg commented 1 year ago

Summarized Thoughts

We could add metadata in stages. To begin, these could be some informative tags that align well with the previous learning materials repositories:

title
topic (follow Earth Lab Lessons categories)
subject discipline (follow DMTC categories)
resource-type
- teaching material
  - lecture
  - lab
- dataset
  - raw
  - cleaned
activity length (same as completion time in DMTC)
- less than 1 hour
- 1 hour
- more than 1 hour
- extended project
teaching time
exercises time
audience level:
- undergraduate
- graduate
- faculty
software used
objectives
keywords
authors

carmengg commented 1 year ago

Follow-up to Matt’s comments about Earth Lab’s metadata.

I looked through Earth Lab’s GitHub. The lessons can be found in the _posts folder. This is an example:

Rendered lesson

md file

yml header:

layout: single category: courses title: "Time Series Data in Python" permalink: /courses/use-data-open-source-python/use-time-series-data-in-python/ week-landing: 1 modified: 2020-09-11 week: 1 sidebar: nav: comments: false author_profile: false course: "intermediate-earth-data-science-textbook" module-type: 'session'

From these I’d suggest we add ‘course’ and, maybe, ‘permalink’ (url) (at least when we have a rendered final version of the lessons).

carmengg commented 1 year ago

Hi @mbjones . Here's a sample yaml file for the sections, let me know what you think:

https://github.com/cyber2a/cyber2a-course/blob/main/section_metadata_blank.yml

It might be a bit too long to copy have it at the beginning of each lesson. Is it possible to have a separate .yml file associated with lesson's .qmd file?

mbjones commented 1 year ago

Good question. I don't know if you can include another file.

cyber2a / cyber2a-course

choose metadata vocabulary for lessons #3

Data Carpentries

NEON teaching modules

Earth Lab Lessons

Data Management Training Clearinghouse (DMTC)

Summarized Thoughts

Follow-up to Matt’s comments about Earth Lab’s metadata.