cyber2a / cyber2a-course

Online materials for the Cyber2A course on AI for Arctic research
https://cyber2a.github.io/cyber2a-course/
Apache License 2.0
0 stars 0 forks source link

choose metadata vocabulary for lessons #3

Open mbjones opened 9 months ago

mbjones commented 9 months ago

We will want to distribute our lessons widely, and it will be helpful to have structured metadata for the lessons to be used in search and discovery. The ESIP Data Management Training Clearinghouse has an established metadata vocabulary for courses, and would make a good starting point for our work -- see below for an example of that metadata. But we also should evaluate metadata vocabularies used in other related initiatives, including the Carpentries (see their example format for lessons), the NEON modules, and EarthLab courses.

Here's an example metadata record from the DMTC for a DataONE lesson on metadata:

[
  {
    "title": "DataONE Data Management Module 07: Metadata",
    "status": 1,
    "pub_status": "published",
    "modification_date": "2022-06-21T10:56:57Z",
    "resource_modification_date": "1900-01-01T00:00:00Z",
    "url": "https://dataoneorg.github.io/Education/lessons/07_metadata/index.html",
    "access_cost": 0,
    "submitter_name": "Amber Budden",
    "submitter_email": "aebudden@dataone.unm.edu",
    "authors": [],
    "author_names": [],
    "author_org": {
      "name": "DataONE Community Engagement and Outreach Working Group",
      "name_identifier": "",
      "name_identifier_type": ""
    },
    "contact": {
      "name": "Amber E.  Budden",
      "org": "DataONE",
      "email": ""
    },
    "abstract_data": "What is metadata? Metadata is data (or documentation) that describes and provides context for data and it is everywhere around us. Metadata allows us to understand the details of a dataset, including: where it was collected, how it was collected, what gaps in the data mean, what the units of measurement are, who collected the data, how it should be attributed etc. By creating and providing good descriptive metadata for our own data, we enable others to efficiently discover and use the data products from our research. This lesson explores the importance of metadata to data authors, users of the data and organizations, and highlights the utility of metadata. It provides an overview of the different metadata standards that exist, and the core elements that are consistent across them; guiding users in selecting a metadata standard to work with and introduces the best practices needed for writing a high quality metadata record.&nbsp;<br />\r\nThis 30-40 minute&nbsp;lesson&nbsp;includes&nbsp;a downloadable presentation (PPT or PDF) with supporting hands-on exercise, handout, and supporting data files.<br />\r\n&nbsp;",
    "abstract_format": "filtered_html",
    "subject": "",
    "keywords": [
      "Data lifecycle",
      "Data management",
      "Metadata"
    ],
    "license": "Creative Commons 0 - CC0 \"No Rights Reserved\" (Public Domain)",
    "usage_info": "",
    "citation": "DataONE Community Engagement & Outreach Working Group (2017) \"Metadata Management\". Accessed at https://dataoneorg.github.io/Education/lessons/07_metadata/ on Jun 01, 2018",
    "locator_data": "",
    "locator_type": "",
    "publisher": "DataONE",
    "version": "v1 - 05.01.2012",
    "created": "2016-09-22T14:24:18",
    "published": "2012-05-01T00:00:00Z",
    "accessibility_features": [
      {
        "name": "Transformation - features that allow the content to be changed for ease of access, e.g., by using large print fonts."
      }
    ],
    "accessibility_summary": "",
    "language_primary": "en",
    "languages_secondary": [],
    "ed_frameworks": [
      {
        "name": "DataONE Education Modules",
        "nodes": [
          {
            "description": "",
            "name": "Describe"
          }
        ],
        "type": "framework"
      },
      {
        "name": "FAIR Data Principles",
        "nodes": [],
        "type": "framework"
      }
    ],
    "target_audience": [],
    "purpose": "Instruction - detailed information about aspects or processes related to data management or data skills.",
    "completion_time": "Up to 1 hour",
    "media_type": "Presentation - representation of the particular way in which an author shows, describes or explains one or more concepts, e.g., a set of Powerpoint slides.",
    "lr_type": "Unit -  long-range plan of instruction on a particular concept containing multiple, related lessons.",
    "creator": "sophisticus",
    "md_record_id": "",
    "ratings": [],
    "rating": 0,
    "id": "32335b19-8e6f-3772-aacc-1379d70330bb",
    "contributors": [],
    "contributor_orgs": [
      {
        "name": "DataONE",
        "name_identifier": "N.A.",
        "name_identifier_type": "N.A.",
        "type": "Final product"
      }
    ],
    "score": 4.3308573,
    "country_of_origin": null,
    "credential_status": null,
    "notes": []
  }
]

Each lesson in the DMTC has an identifier, and with that , you can downloaded the lesson metadata from the DMTC API (see DMTC API documentation) using the following command:

curl -s https://dmtclearinghouse.esipfed.org/api/resources/?id=32335b19-8e6f-3772-aacc-1379d70330bb | jq .results
carmengg commented 9 months ago

Here is some information about how the Data Carpentries, NEON lectures, Earth Lab Lessons, and the Data Management Training Clearinghouse (DMTC) add metadata for archiving and organizing their modules.

Data Carpentries

Date Carpentries lessons are organized into episodes.

Each episode’s YAML header must contain: the episode’s title time estimates for teaching and exercises motivating questions lesson objectives a summary of key points

For example, this is the YAML for the Using RMarkdown episode:

---
source: Rmd
title: "Using RMarkdown"
teaching: 10
exercises: 2
questions:
- "How to write a lesson using RMarkdown?"
objectives:
- "Explain how to use RMarkdown with the new lesson template."
- "Demonstrate how to include pieces of code, figures, and challenges."
keypoints:
- "Edit the .Rmd files not the .md files"
- "Run `make serve` to knit documents and preview lesson website locally"
---

NOTES: I really like the teaching and exercises timing information. It could also serve to distinguish between modules where participants mostly work on exercises and modules where there’s more lecturing. However, I would change teaching and exercises to something like teaching-time and exercises-time to make it explicit it is the timing (at first I thought exercises: 2 meant there were 2 exercises).

NEON teaching modules

NEON teaching modules are part of the Quantitative Undergraduate Biology Education and Synthesis (QUBES) lesson repository. According to their website,

The QUBES platform hosts hundreds of teaching materials, reference materials, and cloud-based software free to use and adapt using open Creative Commons licenses.

In the filters tab in the QUBES lesson browser, there are three main tags associated with each lesson. These are the tags with the corresponding subtypes:

Other less used tag categories are:

The NEON teaching modules use the first three tags and call these “alignments”. Here's an example module. Most modules have further subcategories for each alignment. For example:

The NEON teaching modules also include a citation with information about author, year of publication, title, version number and doi. Example:

Lesley Bulluck (2019). Testing hypotheses about the role of wildfire in structuring avian communities. NEON Faculty Mentoring Network, (Version 2.0). QUBES Educational Resources. doi:10.25334/R2S1-4S62

NOTES: I really liked the core three alignments: audience level, resource type and activity length. The activity length could complement the other time-tracking metadata about lecturing and exercises. I was surprised the “software used” tag wasn’t as used. This would be good to include for us too. Having software used = None for non-technical modules could also be an option.

Earth Lab Lessons

The Earth Lab only hosts resources developed by Earth Lab at University of Colorado, Boulder. The lessons are organized into the following topics:

LESSONS BY TOPIC:

NOTES: Having a topic tag could be good and these are some possible values for it.

Data Management Training Clearinghouse (DMTC)

According to their website:

Data Management Training Clearinghouse (DMTC) is a registry for excellent online learning resources focusing on data skills and capacity building for research data management, data stewardship and data education.

Learning resources can be browsed using the following filters:

Filters

Additionally, these are the tags from the metadata Matt shared:

  1. title
  2. status
  3. pub_status
  4. modification_date
  5. resource_modification_date
  6. url
  7. access_cost
  8. submitter_name
  9. submitter_email
  10. authors
  11. author_names
  12. author_org
  13. contact
  14. abstract_data
  15. keywords
  16. license
  17. usage_info
  18. citation
  19. locator_data
  20. locator_type
  21. publisher
  22. version
  23. created
  24. published
  25. accessibility_features
  26. accessibility_summary
  27. language_primary
  28. languages_secondary
  29. ed_frameworks
  30. target_audience
  31. purpose
  32. completion_time
  33. media_type
  34. lr_type
  35. creator
  36. md_record_id
  37. ratings
  38. rating
  39. id
  40. contributors
  41. contributor_orgs
  42. score
  43. country_of_origin
  44. credential_status
  45. notes

NOTES: Many of the tags in the complete metadata are about submission to the module to DMTC, so these would be needed. Others, like access_cost, license, language_primary, and language_secondary, will probably be the same for all the modules we develop, so we could keep these out for now. Keywords, target_audiece, completion_time, and title are some we could add right away. Once the module becomes stable we could include url, citation and authors.

carmengg commented 9 months ago

Summarized Thoughts

We could add metadata in stages. To begin, these could be some informative tags that align well with the previous learning materials repositories:

carmengg commented 9 months ago

Follow-up to Matt’s comments about Earth Lab’s metadata.

I looked through Earth Lab’s GitHub. The lessons can be found in the _posts folder. This is an example:

Rendered lesson

md file

yml header:

layout: single category: courses title: "Time Series Data in Python" permalink: /courses/use-data-open-source-python/use-time-series-data-in-python/ week-landing: 1 modified: 2020-09-11 week: 1 sidebar: nav: comments: false author_profile: false course: "intermediate-earth-data-science-textbook" module-type: 'session'

From these I’d suggest we add ‘course’ and, maybe, ‘permalink’ (url) (at least when we have a rendered final version of the lessons).

carmengg commented 8 months ago

Hi @mbjones . Here's a sample yaml file for the sections, let me know what you think:

https://github.com/cyber2a/cyber2a-course/blob/main/section_metadata_blank.yml

It might be a bit too long to copy have it at the beginning of each lesson. Is it possible to have a separate .yml file associated with lesson's .qmd file?

mbjones commented 8 months ago

Good question. I don't know if you can include another file.