issues
search
ai-computing
/
aicomp
Other
13
stars
1
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Enhancing the universality of the checking logic for Hugging Face model support
#26
ememos
opened
2 weeks ago
0
Automatic Mixed Precision Applied to Model Operations
#25
ememos
opened
4 weeks ago
0
Support for Variable Batch Sizes Needed in Minibatch Processing
#24
gspark-etri
opened
1 month ago
0
Exception handling in the case where only one process is running
#23
ememos
closed
4 days ago
1
Index error in prepare_labels()
#22
ememos
closed
4 days ago
1
Llama-2-13B model running error on 2 HOSTs with 4 GPUs
#21
ememos
opened
3 months ago
1
Llama model support
#20
ememos
closed
4 days ago
2
Confusion Regarding optimizer_offload and mds_offload Options
#19
ememos
closed
4 days ago
1
Handling OOM during the optimizer.step() phase
#18
ememos
closed
4 days ago
2
CPU memory usage
#17
ememos
closed
4 days ago
1
CPU memory OOM during large model training
#16
ememos
closed
4 days ago
1
Support for optimizer state offloading
#15
ememos
closed
4 days ago
1
Upload README image
#14
baiksong
closed
6 months ago
0
Memory leak in gptj 6B model training
#13
ememos
closed
4 days ago
1
Aggressive memory cleaning support
#12
ememos
closed
4 days ago
1
Support of GPU Memory Size Print During Learning
#11
ememos
closed
4 days ago
1
Activation checkpoint support
#10
ememos
closed
4 days ago
1
Elevate the level of abstraction from rank to stage in user code
#9
ememos
closed
4 days ago
1
DP + PP support
#8
ememos
closed
4 days ago
1
Feature to send the label to the last stage
#7
ememos
closed
4 days ago
1
1F1B scheduling support
#6
ememos
closed
4 days ago
2
Need Examples and README for Refactored System
#5
ememos
closed
4 days ago
1
Code refactoring
#4
ememos
closed
4 days ago
2
Failed to train large models that exceed the aggregated size of all GPU memories
#3
ememos
closed
4 days ago
1
Increasing the Number of GPT-2-like Models Operable in the High-level IR Execution Framework
#2
ememos
closed
4 days ago
2
Preparing an alternative high-level IR execution framework for the training code
#1
ememos
closed
4 days ago
2