How to interpret test output?

donporter commented 3 weeks ago

Hi,

I wanted to calibrate my local setup of TWEEZER against the prior results. I could use a bit of help confirming I understand everything. I know my hardware isn't exactly the same, but I also suspect that the constraints of SGX1 dominate things like differences in clock frequency or untrusted RAM size.

Taking as an example, if I want to reproduce the first bar from Figure 3 in the TWEEZER paper (r100), which I read as about ~28k ops/s, I believe this is the right line in config.csv:

B;;R100_16_1024 ;;../binary/tweezer; 32;8; 5000000; 1;

Is this correct?

In output/[date]-txt, I see this line, which I think is the throughput:

readrandom : 134.818 micros/op 7417 ops/sec; 7.4 MB/s (5000000 of 5000000 found)

Am I reading this right, that I am measuring ~7k ops/s on my local setup?

If I understand correctly so far, any thoughts on why I'm seeing a 4x difference? If I am mistaken, help correctly setting up/interpreting the experiment is much appreciated.

Thanks, Don

hyungon-moon commented 2 weeks ago

Hi,

B;;R100_16_1024 ;;../binary/tweezer; 32;8; 5000000; 1;

This line runs the test with 32 KB as the block size. It's been a while, and I find that it could be confusing as well, but for Tweezer in Figure 3, we used 4 KB as the block size (first paragraph of 7.1). I also revisited the eval sheet that we created before the submission and found that the number you reported (~7k ops/s) matches ours using 32 KB (7776 ops/s). Figure 6 in the paper uses this number.

Thanks, Hyungon

donporter commented 2 weeks ago

Thank you for the clarifications. When I run with a blocksize of 4, I get about 22 kops/s, which is much closer., but not the same. Do you think this is within expected variance, or are there other things that could be different in my setup?

I am running Ubuntu 18.04 (same), kernel 4.19 (close to your 4.15). Fortanix out-of-tree SGX driver. The CPU is a 4 core Intel(R) Xeon(R) CPU E3-1220 v6 @ 3.00GHz (I don't see a clock speed or core count in the paper (guessing you had 4 cores from figure 8), but my CPU is kaby lake; the paper's is coffee lake, so that microarchitecture in the paper is one generation newer). The system has 32 GiB of RAM (whereas your test system has 64). Iran these on an HDD; could also run on a commodity SSD (the paper doesn't specify the persistent storage device used in the experiments).

The most likely culprit is RAM size; do you have any experiments on your end about this? I can probably figure out how to cobble more RAM together to see if the results are sensitive to this on my end.

Part of what confused me is this text:

In this experiment, we issue 5 million transactions starting from a KVS filled with 5 million entries, set the value size to 1024 B, and set the block size to 32 KB to replicate the experiments as close to those of the original setting

In Section 7 (paragraph before 7.1, which appears to correspond to Figure 3). I read the text in 7.1 as corresponding to Figure 4 and following.

I'm also still a bit confused about Figure 6. My reading of Tweezer for the 100% read workload is that even at 32KiB block size, tweezer reports 12--18k op/s, which is about double the 7k ops/s I am getting. I can't tell whether I am running the 16 or 64 GiB KVS experiment. How does one configure this?

Best, Don

hyungon-moon commented 2 weeks ago

It's hard to tell, but I suspect the difference is due to the difference in the main memory size. To access untrusted memory from a Scone app, we use a particular amount of memory solely for the purpose, and this reduces the actual memory available to the Enclave-protected KVS. (You may find some details from #3.

To provide you with more details,

We used Xeon E-2288G, whose base frequency is 3.7GHz and 8core / 16 hyperthreads. The processor we used has 256MB of EPC, which is larger than the prior ones' 128MB. This may also affect performance. I don't remember the exact setting that we used, but you can also determine how many threads Scone can use by referring to Scone's manual. This setting also affects performance.

donporter commented 2 weeks ago

I was able to borrow RAM from another machine, and increase the test system to 64 GiB of main memory. What is strange is that the test's performance drops with more memory.

Throughput drops from 22 kops/s to 5 kops/s with double the main memory. I tried changing SCONE_HEAP to 24G and 32G; neither had much effect.

Any ideas why this is happening?

EPC size difference seems most likely to explain what is happening. I do have an icelake machine, but we hit the same issue as #12 . I'll ask around for access to a coffee lake server to try that.

If you are able to figure out how many threads you used for this experiment, that would be cool.

Thanks, Don

cssl-unist / tweezer

How to interpret test output? #15