issues
search
facebookresearch
/
dlrm
An implementation of a deep learning recommendation model (DLRM)
MIT License
3.71k
stars
825
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Remove syncing logic from train pipeline and use single pipeline in DLRM
#339
joshuadeng
closed
1 year ago
10
Get Criteo Kaggle dataset working with TorchRec-based DLRM
#338
samiwilf
closed
1 year ago
6
Align Adagrad's eps parameter for embeddings and dense layers
#337
janekl
closed
1 year ago
1
Update torchrec_dlrm/README.md with instructions to replicate MLPerf DLRM v1 settings. Add DLRM v1 preprocessing script previously deleted unintentionally.
#336
samiwilf
closed
1 year ago
4
Change README file to include link to blog post
#335
hjmshi
closed
1 year ago
1
Are there model weights available for DLRM V2
#334
mailvijayasingh
closed
1 year ago
4
[Q] Handling Boolean Features
#333
avnish-wynk
closed
1 year ago
4
Accuracy Discrepancy between TorchRec and Pytorch DLRMs
#332
allenfengjr
closed
1 year ago
3
Update torchrec_dlrm readme
#331
samiwilf
closed
1 year ago
1
Let OSS/docker use 0.11.0 while 0.10.3 is used internally. Code is compatible with both, so this should work.
#330
samiwilf
closed
1 year ago
2
Add MLPerf logging + amendments
#329
janekl
closed
1 year ago
1
Change drop_last algo so it's cleaner and doesn't require a last batch of size up to 2*batch_size
#328
samiwilf
closed
1 year ago
3
Use torchmetrics==0.10.3 because it was stable and worked, and torcheval has some issues.
#327
samiwilf
closed
1 year ago
1
Loss is way to high when applying QR Embedding with add operation
#326
YoungsukKim12
closed
1 year ago
9
Change dataset traversal so all ranks start from consecutive batches at beginning of dataset (#930)
#325
samiwilf
closed
1 year ago
10
Compute AUROC across ranks correctly
#324
janekl
closed
1 year ago
1
opt dlrm into black for auto format
#323
colin2328
closed
1 year ago
3
Apply lintrunner & edit docstrings
#322
janekl
closed
1 year ago
1
update README to include TorchRec tutorial; add comment to link to FBGEMM fused Adagrad call with explanation
#321
colin2328
closed
1 year ago
5
Update comments in multi_hot_criteo.py
#320
samiwilf
closed
1 year ago
1
Docs + a few edits
#319
janekl
closed
1 year ago
1
Finalize Dockerfile & requirements.txt
#318
janekl
closed
1 year ago
1
Requirements pin fix
#317
janekl
closed
1 year ago
1
pin dlrmv2 to torchrec (and fbgemm) v0.3.2
#316
colin2328
closed
1 year ago
1
add in_backward_optimizer_filter to work with in_backward_optimizers (#892)
#315
colin2328
closed
1 year ago
1
Compute AUROC using torcheval
#314
janekl
closed
1 year ago
1
Add __len__ method to RestartableMap (bugfix)
#313
janekl
closed
1 year ago
2
Change --drop_last to --drop_last_training_batch, applied only to the…
#312
samiwilf
closed
1 year ago
5
Can't install torch rec on gcp
#311
zzh1024
closed
1 year ago
2
Add support for dropping last non-full batch
#310
samiwilf
closed
1 year ago
3
Make PipelinedForward syncing transparent to caller
#309
samiwilf
closed
1 year ago
1
Fix train/val/test for model using multiple train pipelines
#308
joshuadeng
closed
1 year ago
1
make pg, topology and sharders optional to the planner
#307
colin2328
closed
1 year ago
1
Add support for materializing and reading materialized 1tb criteo mul…
#306
samiwilf
closed
1 year ago
6
AUROC calculation with the latest torchmetrics==0.11.0
#305
janekl
closed
1 year ago
3
Add support for materializing and reading materialized 1tb criteo multi-hot dataset
#304
samiwilf
closed
1 year ago
2
Decouple train/val/test code by using separate pipelines for each. Remove --change_lr since --lr_scheduler can perform same behavior.
#303
samiwilf
closed
1 year ago
6
Add --print_sharding_plan option to torchrec_dlrm/dlrm_main.py
#302
samiwilf
closed
1 year ago
2
Make DLRM symbolically traceable with FX, and fix Python version check
#301
vkuzo
closed
1 year ago
2
make pg, topology and sharders optional to the planner
#300
colin2328
closed
1 year ago
7
Flag for enabling TF32 mode for A100
#299
janekl
closed
1 year ago
2
Remove variable batch size from EBC init in DLRM
#298
joshuadeng
closed
1 year ago
1
How to do asynchronous distributed training with DLRM?
#297
PavithranRick
closed
1 year ago
2
Add support for in-memory Criteo training set shuffle. Add supporting unit tests
#296
samiwilf
closed
1 year ago
4
add MIT license to ai_codesign/dlrm
#295
colin2328
closed
1 year ago
3
use vanilla adgrad instead of row wise adagrad for reference implementation
#294
colin2328
closed
1 year ago
5
mini-batch-size and num-batches in relation to global sample number
#293
JasonFantl
closed
1 year ago
3
distributed_launch doesn't work with Terabyte dataset.
#292
gakolhe
closed
1 year ago
2
Decouple train/val/test code by using separate pipelines for each. Remove --change_lr since --lr_scheduler can perform same behavior.
#291
samiwilf
closed
1 year ago
11
Remove dependence on torch.distributed.algorithms.join. Instead size batches such that all ranks always have the same num_batches. This is possible by increasing batch sizes by 1 sample when necessary to keep num_batches equal across ranks.
#290
samiwilf
closed
1 year ago
6
Previous
Next