Size of embedding tables in MLPerf checkpoint

Hello, I am looking at the pre-trained weights for the MLPerf benchmark configuration on Criteo Terabyte that are provided in the README (link). If I understand correctly, this should be the best checkpoint of the configuration that is run with the script ./bench/run_and_time.sh. Based on the code snippet

if args.max_ind_range > 0:
            ln_emb = np.array(
                list(
                    map(
                        lambda x: x if x < args.max_ind_range else args.max_ind_range,
                        ln_emb,
                    )
                )
            )

since that config uses --max-ind-range=40000000, I was expecting the largest embedding tables (namely, tables 0, 9, 19, 20, 21) to be reduced to have exactly 40M rows, however the length of these tensors in the state_dict in the downloaded checkpoint is more variable than that:

**emb_l.0.weight: torch.Size([39884406, 128])**
emb_l.1.weight: torch.Size([39043, 128])
emb_l.2.weight: torch.Size([17289, 128])
emb_l.3.weight: torch.Size([7420, 128])
emb_l.4.weight: torch.Size([20263, 128])
emb_l.5.weight: torch.Size([3, 128])
emb_l.6.weight: torch.Size([7120, 128])
emb_l.7.weight: torch.Size([1543, 128])
emb_l.8.weight: torch.Size([63, 128])
**emb_l.9.weight: torch.Size([38532951, 128])**
emb_l.10.weight: torch.Size([2953546, 128])
emb_l.11.weight: torch.Size([403346, 128])
emb_l.12.weight: torch.Size([10, 128])
emb_l.13.weight: torch.Size([2208, 128])
emb_l.14.weight: torch.Size([11938, 128])
emb_l.15.weight: torch.Size([155, 128])
emb_l.16.weight: torch.Size([4, 128])
emb_l.17.weight: torch.Size([976, 128])
emb_l.18.weight: torch.Size([14, 128])
**emb_l.19.weight: torch.Size([39979771, 128])**
**emb_l.20.weight: torch.Size([25641295, 128])**
**emb_l.21.weight: torch.Size([39664984, 128])**
emb_l.22.weight: torch.Size([585935, 128])
emb_l.23.weight: torch.Size([12972, 128])
emb_l.24.weight: torch.Size([108, 128])
emb_l.25.weight: torch.Size([36, 128])

How does the hashing work for this model? It cannot be just taking the categorical value ID modulo 40M as in the released pytorch code. Moreover, it seems to me that also some of the smaller embedding tables have been reduced in size (suggesting additional custom filtering/merging of the categorical values?)

Also, I am not seeing the test_auc key in the checkpointed dictionary, despite --mlperf-logging being set in ./bench/run_and_time.sh: what's the test AUC of this pre-trained model?

facebookresearch / dlrm

Size of embedding tables in MLPerf checkpoint #369