feat: scale both size and result when use sample

xiaguan commented 2 months ago

sample ratio 0.1 on a small trace

./bin/cachesim ../data/twitter_cluster52.csv csv lru 10kb,100kb,2mb,4mb,8mb,16mb,32mb,64mb -t "time-col=1, obj-id-col=2, obj-size-col=3, delimiter=," --sample-ratio 0.1

sampling result is

result/twitter_cluster52.csv                              LRU cache size       10KiB, 83463 req, miss ratio 0.6038, byte miss ratio 0.6853
result/twitter_cluster52.csv                              LRU cache size      100KiB, 83463 req, miss ratio 0.4145, byte miss ratio 0.4422
result/twitter_cluster52.csv                              LRU cache size     2048KiB, 83463 req, miss ratio 0.2321, byte miss ratio 0.2352
result/twitter_cluster52.csv                              LRU cache size     4096KiB, 83463 req, miss ratio 0.1957, byte miss ratio 0.1939
result/twitter_cluster52.csv                              LRU cache size     8192KiB, 83463 req, miss ratio 0.1664, byte miss ratio 0.1614
result/twitter_cluster52.csv                              LRU cache size    16384KiB, 83463 req, miss ratio 0.1491, byte miss ratio 0.1430
result/twitter_cluster52.csv                              LRU cache size    32768KiB, 83463 req, miss ratio 0.1440, byte miss ratio 0.1373
result/twitter_cluster52.csv                              LRU cache size    65536KiB, 83463 req, miss ratio 0.1440, byte miss ratio 0.1373

without sampling

result/twitter_cluster52.csv                              LRU cache size       10KiB, 1000000 req, miss ratio 0.6118, byte miss ratio 0.6522
result/twitter_cluster52.csv                              LRU cache size      100KiB, 1000000 req, miss ratio 0.4061, byte miss ratio 0.4348
result/twitter_cluster52.csv                              LRU cache size     2048KiB, 1000000 req, miss ratio 0.2263, byte miss ratio 0.2348
result/twitter_cluster52.csv                              LRU cache size     4096KiB, 1000000 req, miss ratio 0.1920, byte miss ratio 0.1963
result/twitter_cluster52.csv                              LRU cache size     8192KiB, 1000000 req, miss ratio 0.1652, byte miss ratio 0.1670
result/twitter_cluster52.csv                              LRU cache size    16384KiB, 1000000 req, miss ratio 0.1483, byte miss ratio 0.1483
result/twitter_cluster52.csv                              LRU cache size    32768KiB, 1000000 req, miss ratio 0.1435, byte miss ratio 0.1431
result/twitter_cluster52.csv                              LRU cache size    65536KiB, 1000000 req, miss ratio 0.1435, byte miss ratio 0.1431

1a1a11a commented 2 months ago

Thank you for the work! However, I think it is not appropriate to scale the size because it would cause confusion.

I would like to separate the sampling part from libCacheSim, because online sampling does not achieve the speedup effect due to tracing reading being expensive. I am open to discussions.

xiaguan commented 2 months ago

I agree that this implementation will not alleviate the pressure on reading traces. However, I am not quite clear about the suggestion to separate the sampling part from libCachesim?

1a1a11a commented 2 months ago

I think the sampling ratio should not affect cache size, otherwise a user will be confused.

separate the sampling part is that we can create separate targets for sampling the traces. I would prefer removing the sampling support in the future version. Cachesim is simulation tool and should do one thing and do it well.

1a1a11a / libCacheSim

feat: scale both size and result when use sample #83