cyber2a / cyber2a-course

Online materials for the Cyber2A course on AI for Arctic research
https://cyber2a.github.io/cyber2a-course/
Apache License 2.0
0 stars 1 forks source link

brainstorm potential learning modules #2

Open mbjones opened 1 year ago

mbjones commented 1 year ago

Brainstorm a list of more granular learning modules, and outline a logical teaching sequence for them.

Background

In the proposal, we outlined six broad topic areas for the curriculum, and provided a description of each. While these are a great start, I also think we need to break these down into a larger number of teachable modules, and start working our way through those. This ticket is to dicuss and agree on an initial plan for lesson modules, and I propose some to start with below.

We also proposed several hands-on labs to give real-world use cases to work from:

Learning module breakdown

Here's some initial module ideas, grouped by topic. Feel free to expand below.

Many more, let's start the discussion here...

carmengg commented 1 year ago

Module idea for Topic 1: Design principles for an AI-ready training dataset

These are some ideas for a module on AI-ready data. 🤔💭

Module 1: AI-ready data (~1.5 hours instructor-led during week-long workshop)

Learning Goals:

Subtopics

  1. How is data used in AI? Fundamentals of training AI models

  2. Why is high quality AI-ready data important?

    • AI is only as good as your data ("garbage in, garbage out")
    • Reduce bias
    • "Reduce time-to-insight"
    • Maximize reproducibility and impact
    • Build trust on AI applications by showing transparency
    • I liked this Intel saying "You’re not AI-ready until your data is.”
  3. What is AI-ready data? Data organized, documented, and archived to facilitate using it for AI modeling. Maybe mention there is no "official" definition and several agencies and organizations are currently working on establishing standards.

    “The key motivation to produce FAIR and AI-ready datasets is to automate and streamline the creation of AI tools and approaches that leverage modern computing and scientific data infrastructure. Data should be a basic component of this, as opposed to a product that requires extensive pre-processing or feature engineering before it is adequate for computing.” FAIR and AI-ready scientific datasets - Eliu Huerta, Feb 2022

  4. Characteristics of AI-ready and non-AI ready data

    • Characteristics:
AI-ready non-AI-ready
short explanation of each term short explanation of each term
representative xxx
minimizes bias xxx
accessible xxx
consistent xxx
machine-readable metadata xxx
... ...

There could be a small discussion where participants talk about a dataset they have used or created and whether it is AI-ready or not.

  1. Difference between Open, Analysis-ready, and AI-ready data short

  2. How to make your dataset AI-ready For this section, we can discuss if we want to follow a specific 'checklist' or 'guidelines'. This is one option: ESIP AI-ready checklist. This article AI-Ready Open Data - Bipartisan Policy Center, Feb 2023 has some good explanations of what is covered in each of the ESIP categories (quality, documentation, access, preparation).

  3. Additional considerations for AI-ready data related to people The Datasheets for Datasets - Gebru et al. 2021, Communications of the ACM paper has a few more items about data related to people that might be worth considering in the context of Arctic research. These are in the sections: Composition: questions 10 - 12 and extra questions 1-3.
    Collection: question 6 and extra questions 1-5

  4. Where to find AI-ready data From Designing A Path Towards an AI-ready Data Standard - July 2023 ESIP Meeting: Use-Case driven:

    • Radiant MLHub
    • Foundry-ML
    • SPACEML
    • Materials Data Facility

    Data-Type Driven:

    • AI-Ready Earth Observation Training Datasets
    • ML Commons

Other references:

Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says - Forbes 2016

What is AI-ready Open Data? - NOAA Presentation Slides, Oct. 2020

Analysis Ready Data Defined - Planet Stories 2018 This has a nice account of everything that goes into pre-processing satellite data to make analysis ready.

AI-Ready Open Data - Bipartisan Policy Center, Feb 2023 This has a comprehensive account and timeline of US government efforts to adopt AI-readiness standards.

Accelerating Artificial Intelligence Applications at Scale with AI-ready Data - ESIP 3rd and 4th presentations.

carmengg commented 1 year ago

Hi, all. I've added the potential modules into the book so we can start fleshing them out a bit more. I'll add the metadata in the coming days, but figured we could start here.

@chiayuhsu The book is made with Quarto (file extension .qmd), it is very similar to using markdown. If you haven't used it before, you'll need to install it: https://quarto.org/docs/get-started/ . I use VS Code to edit and preview the qmds and it works out great. I followed the instructions for step 2 in the Quarto website and found them quite clear.

The book will automatically re-render every time we push any changes. If you want to see a preview of the render (for the whole book) you can run quarto preview at the cyber2a-course directory. If you just need to preview a single qmd, you can open it in VSCode and click the preview button (after you've installed the Quarto extensions for VSCode). Please let me know if I can be of any help figuring out how to use quarto!

Thanks!