In the old code, we were not checking the return value of the results. If the reads are short (requested N bytes but it reads M < N bytes), then we have to continue from where it left off and submit another request. Now, we can write a regression test
and measure speed and after speed is measured, check correctness without affecting timing.
Regression test:
import os
import tempfile
from functools import partial
import time
import torch
import dgl.graphbolt as gb
assert_equal = partial(torch.testing.assert_close, rtol=0, atol=0)
def to_on_disk_numpy(test_dir, name, t):
path = os.path.join(test_dir, name + ".npy")
gb.numpy_save_aligned(path, t.numpy())
return path
def test_index_select_throughput_and_iops(shape, dtype, indices, num_threads_list):
tensor = torch.randint(0, 127, shape, dtype=dtype)
skip_first = 10
results = []
IOPSs = []
with tempfile.TemporaryDirectory() as test_dir:
path = to_on_disk_numpy(test_dir, "tensor", tensor)
for num_threads in num_threads_list:
feature = gb.DiskBasedFeature(path=path, num_threads=num_threads)
throughput_sum = 0
iops_sum = 0
for i, idx in enumerate(indices):
start = time.time()
result = feature.read(idx)
duration = time.time() - start
assert_equal(result, tensor[idx])
if i >= skip_first:
throughput_sum += result.nbytes / duration
iops_sum += idx.numel() / duration
throughput = throughput_sum / (len(indices) - skip_first)
iops = iops_sum / (len(indices) - skip_first)
print(num_threads, int(throughput / (2 ** 20)), "MiB/s", int(iops), "IOPS")
results.append(throughput)
IOPSs.append(iops)
return results, IOPSs
shape = [2500000, 4096]
dtype = torch.int8
batch_size = 100000
indices = [torch.randint(0, shape[0], [batch_size], dtype=torch.int32) for _ in range(25)]
num_threads_list = list(range(1, 9))
throughputs, IOPSs = test_index_select_throughput_and_iops(shape, dtype, indices, num_threads_list)
print(list((num_threads, int(throughput / (2 ** 20)), int(iops)) for num_threads, throughput, iops in zip(num_threads_list, throughputs, IOPSs)))
Please feel free to remove inapplicable items for your PR.
[ ] The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
[ ] I've leverage the tools to beautify the python and c++ code.
[ ] The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
[ ] All changes have test coverage
[ ] Code is well-documented
[ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
[ ] Related issue is referred in this PR
[ ] If the PR is for a new model/paper, I've updated the example index here.
@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master
Description
In the old code, we were not checking the return value of the results. If the reads are short (requested N bytes but it reads M < N bytes), then we have to continue from where it left off and submit another request. Now, we can write a regression test and measure speed and after speed is measured, check correctness without affecting timing.
Regression test:
Benchmark results with 4K byte feature dimension:
Checklist
Please feel free to remove inapplicable items for your PR.
Changes