castorini / candle

PyTorch utilities for parameter pruning and multiplies reduction
MIT License
2 stars 1 forks source link

Experimental results #2

Open daemon opened 6 years ago

daemon commented 6 years ago

Across 300 queries, without subtracting idle power:

MoS: reproduced STOA perplexity

Latency UI test: http://rocketeer.net/test

daemon commented 6 years ago

Need to have the following:

daemon commented 6 years ago

Subtracting idle power draw:

daemon commented 6 years ago

MoS with dynamic evaluation, after subtracting idle power draw:

daemon commented 6 years ago

MoS pruned results:

daemon commented 6 years ago

MoS final pruned results:

daemon commented 6 years ago

MoS pruned power results:

~40% reduction in time, ~33% reduction in power

daemon commented 6 years ago

PPL * J/q (lower is better)

daemon commented 6 years ago

PTB results

AWD-LSTM:

100%

350: joules: 103.14503956365583, time: 77.97293639183044

294.67 mJ/q, 223 ms/q

4-layer QRNN:

100%

350: joules: 103.53337072038651, time: 78.45873022079468

295.81 mJ/q, 224 ms/q

80%

350: joules: 88.23621265220639, time: 65.85083532333374

252.1 mJ/q, 188 ms/q

60%

350: joules: 72.53076736021042, time: 50.882970571517944

207.2 mJ/q, 145 ms/q

40%

350: joules: 61.69936975002289, time: 43.90885138511658

176.63 mJ/q, 125.5 ms/q

20%

350: joules: 40.713961067676536, time: 29.174120903015137

116.33 mJ/q, 83.35 ms/q

daemon commented 6 years ago

WikiText-2 results

4-layer QRNN:

100%

350: joules: 136.39474933767315, time: 105.49620127677917

389.69 mJ/q, 301.41 ms/q

80%

350: joules: 121.47651870107649, time: 93.16583108901978

347.07 mJ/q, 266.18 ms/q

60%

350: joules: 100.46008950614932, time: 74.6132459640503

287.02 mJ/q, 213.18 ms/q

40%

350: joules: 84.76623452949524, time: 64.72594022750854

242.18 mJ/q, 184.93 ms/q

20%

350: joules: 58.218251074790956, time: 42.349632263183594

166.33 mJ/q, 120.99 ms/q