NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.34k stars 1.39k forks source link

sparsity test part1 failed #1659

Open jackzhou121 opened 1 year ago

jackzhou121 commented 1 year ago

run checkpointing_test_part1.py test case in sparsity failed

error log: [compute_sparse_masks] build offline permutation graph on none-distributed model. [compute_sparse_masks] Take 0.0007 seconds to finish build_offline_permutation_graph function. [compute_sparse_masks] skip applying offline permutation because there is no valid offline_permutation_fx_graph. Traceback (most recent call last): File "sparsity_part1.py", line 94, in main(args) File "sparsity_part1.py", line 58, in main ASP.compute_sparse_masks() File "/usr/local/lib/python3.8/dist-packages/apex/contrib/sparsity/asp.py", line 253, in compute_sparsemasks mask.set(cls.__calculate_mask(p)) File "/usr/local/lib/python3.8/dist-packages/apex/contrib/sparsity/asp.py", line 89, in create_mask_from_pattern return create_mask(param, mask_calculator).bool() File "/usr/local/lib/python3.8/dist-packages/apex/contrib/sparsity/sparse_masklib.py", line 162, in create_mask mask = func(t, density) File "/usr/local/lib/python3.8/dist-packages/apex/contrib/sparsity/sparse_masklib.py", line 141, in m4n2_2d_best return mn_2d_best(mat, 4, 2) File "/usr/local/lib/python3.8/dist-packages/apex/contrib/sparsity/sparse_masklib.py", line 124, in mn_2d_best patterns = compute_valid_2d_patterns(m,n).cuda() File "/usr/local/lib/python3.8/dist-packages/apex/contrib/sparsity/sparse_masklib.py", line 112, in compute_valid_2d_patterns patterns = torch.empty(list(set(permutations(patterns,m)))) TypeError: empty(): argument 'size' (position 1) must be tuple of ints, but found element of type tuple at pos 0

Environment