Closed hyunwoongko closed 2 years ago
I have a few questions.
How much do the students who take this course understand NLP? If possible, it would be better to omit a lot of Chapter 1 (Natural Language Processing) part. Personally, I think it would be better to introduce them in the Hugging Face related lector. Then I can focus more on the engineering part. How about changing the order of the lectures?
Since the subject is "Large-scale Transformer", I added many related techniques to Chapter 2. However, considering that this is a MLOps course, it would be better to include inference optimization techniques such as ONNX, TensorRT and Triton Inference Server rather than such training techniques. Do we currently have some lectures about this part? FYI, I can lecture that part because I am very familiar with such techniques. (If we deal with the NLP part in the Hugging Face lecture, I think this will be possible.)
Since the lecture plan is very rough, so it contains most of the content related to the Large-scale Transformer. But I think it could be hard to lector all of these in 2 hours. Do you want narrow deep content? or broad shallow content?
This is my rough plan for the lecture. Please feel free to comment ! cc @hunkim
Char
,Morpheme
,BPE
BERT
,RoBERTa
,Electra
GPT1
,GPT2
BART
,T5
Proxying
,Meta Prompting
Limitations of GPT3
MT-NLG
,PaLM
Gopher
,Chinchilla
Switch Transformer
Web GPT
,Instruct GPT
torch.distributed package
Data Parallel
,Distributed Data Parallel
Pipeline Parallel
,Tensor Parallel
Mixed Precision
,Activation Checkpointing
ZeRO DP
,ZeRO Offload
Megatron-LM
Megatron-DeepSpeed
GPT-Neo Family
,OSLO
,Polyglot
BLOOM