issues
search
facebookresearch
/
fairscale
PyTorch extensions for high performance and large scale training.
Other
3.13k
stars
274
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
support for grad acc
#1190
ngoyal2707
opened
6 days ago
0
Hi, Groups division may be incorrect in initialize() in fairscale/nn/model_parallel/initialize.py
#1189
Youngluc
opened
6 days ago
0
Raising `assert param.grad is not None` when finetuning LoRA.
#1188
HashimotoPatrickMu
opened
2 months ago
1
Bump scikit-learn from 1.1.3 to 1.5.0
#1187
dependabot[bot]
opened
2 months ago
0
[FSDPv1] Optimize memory usage for optimize_backward_concat=True
#1186
chrisxcai
closed
2 months ago
0
FP8 AllGather Support in Fairscale
#1185
levendlee
opened
3 months ago
0
[FSDPv1] Only perform cat() during last microbatch backward() within FlattenParamsWrapper
#1184
chrisxcai
closed
3 months ago
0
Llama4 FP8 Training Debug - fairscale
#1183
jiecaoyu
opened
4 months ago
0
Add timeout in initialize_model_parallel
#1182
vladmihailescu
closed
4 months ago
0
Fix minor grammatical corrections in docs
#1181
aakashapoorv
opened
4 months ago
0
[FSDPv1] Only perform cat() during last microbatch backward() within FlattenParamsWrapper
#1180
chrisxcai
opened
4 months ago
0
Updated the README file
#1179
KPCOFGS
closed
4 months ago
0
[WIP] Make FSDPv1 only perform cat() during last microbatch backward() within FlattenParamsWrapper
#1178
chrisxcai
opened
4 months ago
0
sync fbcode cp pg initialize
#1177
amylittleyang
closed
4 months ago
0
add get_cp_ranks to model_parallel initialize
#1176
amylittleyang
closed
5 months ago
0
Add cast input argument
#1175
whbldhwj
closed
5 months ago
0
add context parallel group init to mp init
#1174
amylittleyang
closed
5 months ago
0
Make sure that tensor is contiguous before gathering across processes
#1173
patrickvonplaten
opened
5 months ago
0
[question] Different training between DDP & Sharded DDP
#1172
kwohlfahrt
opened
5 months ago
0
Added requires_grad check for params_with_grad method
#1171
whbldhwj
closed
5 months ago
0
what are pointwise Optimizers and non-pointwise Optimizers?
#1170
bugm
closed
5 months ago
4
Bump black from 22.3.0 to 24.3.0
#1169
dependabot[bot]
opened
5 months ago
0
Fairscale support for only performing allreduce in last microbatch
#1168
jiecaoyu
opened
5 months ago
0
Fix params_with_grad in FSDP when the model has frozen parameters
#1167
whbldhwj
opened
6 months ago
0
Changed to only run reshard hook if all gradients computed
#1166
awgu
closed
5 months ago
0
Example of MOE
#1165
Juanhui28
opened
6 months ago
1
Avoid calling _free_fp16_param_shard() too early
#1164
jiecaoyu
opened
6 months ago
2
FSDP on the same CNN model requires more memory than DataParallel
#1163
s-reaungamornrat
closed
5 months ago
0
Should assign norm_type instead of scale_grad_by_freq
#1162
brad-mengchi
closed
7 months ago
1
added option for no PG validation for faster init
#1161
ngoyal2707
closed
7 months ago
0
ci: Use GITHUB_OUTPUT envvar instead of set-output command
#1160
arunsathiya
opened
7 months ago
1
Added reshard hook for frozen params in backward
#1159
awgu
opened
7 months ago
5
Add support for `torch.set_default_device` when initializing model parameters
#1158
fshp971
opened
8 months ago
0
Assign self.norm_type to input norm_type
#1157
gtamer2
closed
7 months ago
1
Issue in `ParallelEmbedding` constructor - scale_grad_by_freq being assigned to norm_type
#1156
gtamer2
closed
7 months ago
2
How can I use torchrun + model parallelism + FSDP
#1155
HackGiter
opened
9 months ago
1
fixed broken clipping
#1154
ngoyal2707
closed
9 months ago
0
fix .grad=None issue when param is not sharded
#1153
jiecaoyu
closed
9 months ago
0
changes to keep reduced grad in fp32
#1152
vedanuj
closed
9 months ago
0
[not to be merged yet] added temp changes for fp32 main grad, might not work for TE
#1151
ngoyal2707
closed
9 months ago
0
fix no shard case
#1150
artkorenev
closed
9 months ago
0
Fix _free_full_params()
#1149
hadasah
opened
9 months ago
0
Extend CheckpointFunction to track all tensor input/output
#1148
000Justin000
opened
10 months ago
0
[Not for merge] fp8allgather debug
#1147
jiecaoyu
opened
10 months ago
0
It is dangerous to using default non_block=True.
#1146
heshenghuan
opened
10 months ago
0
torch.compile with FSDP
#1145
santha96
closed
10 months ago
2
Added fns for manual free, reduce-scatter; removed stream sync if event sync
#1144
awgu
closed
10 months ago
1
Cleared backward hooks to avoid accumulating over iterations
#1143
awgu
closed
11 months ago
0
Add main grad before fwd pass
#1142
vedanuj
opened
11 months ago
2
Removed extra `cat` before reduce-scatter
#1141
awgu
closed
11 months ago
1
Next