DSA-MLOPS / main

17 stars 0 forks source link

Large Transformer Part #2

Closed hyunwoongko closed 2 years ago

hyunwoongko commented 2 years ago

This is my rough plan for the lecture. Please feel free to comment ! cc @hunkim


  1. Lecturer Introduction

  1. Natural Language Processing
    1. NLP Basics
      1. "How does AI understand text?"
      2. "Tokenization" - Char, Morpheme, BPE
    2. RNN and Attention
      1. "Recurrent Neural Network"
      2. "Encoder and Decoder"
      3. "Attention Mechanism"
    3. Transformer
      1. "Transformer"
    4. Pre-Trained Transformers
      1. "Transfer Learning Basics"
      2. "Encoder Models" - BERT, RoBERTa, Electra
      3. "Decoder Models" - GPT1, GPT2
      4. "Encoder-Decoder Models" - BART, T5

  1. Large Transformer Models
    1. GPT3
      1. "In-context Learning"
      2. "Prompt Programming Techniques" - Proxying, Meta Prompting
      3. "Wait, Is GPT3 a magician?" - Limitations of GPT3
    2. Beyond GPT3
      1. "Large-scale is All You Need" - MT-NLG, PaLM
      2. "Large-scale is NOT All You Need" - Gopher, Chinchilla
      3. "Mixture of Experts" - Switch Transformer
      4. "Efforts to go beyond GPT3" - Web GPT, Instruct GPT
    3. Techniques for Large Model Training
      1. "Distributed Training Basics" - torch.distributed package
      2. "Data Parallelism" - Data Parallel, Distributed Data Parallel
      3. "Model Parallelism" - Pipeline Parallel, Tensor Parallel
      4. "Optimization Techniques" - Mixed Precision, Activation Checkpointing
      5. "Zero Redundancy Optimization" - ZeRO DP, ZeRO Offload
    4. Open Source Projects
      1. NVIDIA: Megatron-LM
      2. NVIDIA & Microsoft: Megatron-DeepSpeed
      3. EleutherAI: GPT-Neo Family, OSLO, Polyglot
      4. BigScience Workshop: BLOOM

hyunwoongko commented 2 years ago

I have a few questions.

  1. How much do the students who take this course understand NLP? If possible, it would be better to omit a lot of Chapter 1 (Natural Language Processing) part. Personally, I think it would be better to introduce them in the Hugging Face related lector. Then I can focus more on the engineering part. How about changing the order of the lectures?

  2. Since the subject is "Large-scale Transformer", I added many related techniques to Chapter 2. However, considering that this is a MLOps course, it would be better to include inference optimization techniques such as ONNX, TensorRT and Triton Inference Server rather than such training techniques. Do we currently have some lectures about this part? FYI, I can lecture that part because I am very familiar with such techniques. (If we deal with the NLP part in the Hugging Face lecture, I think this will be possible.)

  3. Since the lecture plan is very rough, so it contains most of the content related to the Large-scale Transformer. But I think it could be hard to lector all of these in 2 hours. Do you want narrow deep content? or broad shallow content?