facebookresearch fairscale issues

facebookresearch / fairscale

PyTorch extensions for high performance and large scale training.

Other

3.13k stars 274 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

support for grad acc

#1190 ngoyal2707 opened 6 days ago
0
Hi, Groups division may be incorrect in initialize() in fairscale/nn/model_parallel/initialize.py

#1189 Youngluc opened 6 days ago
0
Raising `assert param.grad is not None` when finetuning LoRA.

#1188 HashimotoPatrickMu opened 2 months ago
1
Bump scikit-learn from 1.1.3 to 1.5.0

#1187 dependabot[bot] opened 2 months ago
0
[FSDPv1] Optimize memory usage for optimize_backward_concat=True

#1186 chrisxcai closed 2 months ago
0
FP8 AllGather Support in Fairscale

#1185 levendlee opened 3 months ago
0
[FSDPv1] Only perform cat() during last microbatch backward() within FlattenParamsWrapper

#1184 chrisxcai closed 3 months ago
0
Llama4 FP8 Training Debug - fairscale

#1183 jiecaoyu opened 4 months ago
0
Add timeout in initialize_model_parallel

#1182 vladmihailescu closed 4 months ago
0
Fix minor grammatical corrections in docs

#1181 aakashapoorv opened 4 months ago
0
[FSDPv1] Only perform cat() during last microbatch backward() within FlattenParamsWrapper

#1180 chrisxcai opened 4 months ago
0
Updated the README file

#1179 KPCOFGS closed 4 months ago
0
[WIP] Make FSDPv1 only perform cat() during last microbatch backward() within FlattenParamsWrapper

#1178 chrisxcai opened 4 months ago
0
sync fbcode cp pg initialize

#1177 amylittleyang closed 4 months ago
0
add get_cp_ranks to model_parallel initialize

#1176 amylittleyang closed 5 months ago
0
Add cast input argument

#1175 whbldhwj closed 5 months ago
0
add context parallel group init to mp init

#1174 amylittleyang closed 5 months ago
0
Make sure that tensor is contiguous before gathering across processes

#1173 patrickvonplaten opened 5 months ago
0
[question] Different training between DDP & Sharded DDP

#1172 kwohlfahrt opened 5 months ago
0
Added requires_grad check for params_with_grad method

#1171 whbldhwj closed 5 months ago
0
what are pointwise Optimizers and non-pointwise Optimizers?

#1170 bugm closed 5 months ago
4
Bump black from 22.3.0 to 24.3.0

#1169 dependabot[bot] opened 5 months ago
0
Fairscale support for only performing allreduce in last microbatch

#1168 jiecaoyu opened 5 months ago
0
Fix params_with_grad in FSDP when the model has frozen parameters

#1167 whbldhwj opened 6 months ago
0
Changed to only run reshard hook if all gradients computed

#1166 awgu closed 5 months ago
0
Example of MOE

#1165 Juanhui28 opened 6 months ago
1
Avoid calling _free_fp16_param_shard() too early

#1164 jiecaoyu opened 6 months ago
2
FSDP on the same CNN model requires more memory than DataParallel

#1163 s-reaungamornrat closed 5 months ago
0
Should assign norm_type instead of scale_grad_by_freq

#1162 brad-mengchi closed 7 months ago
1
added option for no PG validation for faster init

#1161 ngoyal2707 closed 7 months ago
0
ci: Use GITHUB_OUTPUT envvar instead of set-output command

#1160 arunsathiya opened 7 months ago
1
Added reshard hook for frozen params in backward

#1159 awgu opened 7 months ago
5
Add support for `torch.set_default_device` when initializing model parameters

#1158 fshp971 opened 8 months ago
0
Assign self.norm_type to input norm_type

#1157 gtamer2 closed 7 months ago
1
Issue in `ParallelEmbedding` constructor - scale_grad_by_freq being assigned to norm_type

#1156 gtamer2 closed 7 months ago
2
How can I use torchrun + model parallelism + FSDP

#1155 HackGiter opened 9 months ago
1
fixed broken clipping

#1154 ngoyal2707 closed 9 months ago
0
fix .grad=None issue when param is not sharded

#1153 jiecaoyu closed 9 months ago
0
changes to keep reduced grad in fp32

#1152 vedanuj closed 9 months ago
0
[not to be merged yet] added temp changes for fp32 main grad, might not work for TE

#1151 ngoyal2707 closed 9 months ago
0
fix no shard case

#1150 artkorenev closed 9 months ago
0
Fix _free_full_params()

#1149 hadasah opened 9 months ago
0
Extend CheckpointFunction to track all tensor input/output

#1148 000Justin000 opened 10 months ago
0
[Not for merge] fp8allgather debug

#1147 jiecaoyu opened 10 months ago
0
It is dangerous to using default non_block=True.

#1146 heshenghuan opened 10 months ago
0
torch.compile with FSDP

#1145 santha96 closed 10 months ago
2
Added fns for manual free, reduce-scatter; removed stream sync if event sync

#1144 awgu closed 10 months ago
1
Cleared backward hooks to avoid accumulating over iterations

#1143 awgu closed 11 months ago
0
Add main grad before fwd pass

#1142 vedanuj opened 11 months ago
2
Removed extra `cat` before reduce-scatter

#1141 awgu closed 11 months ago
1