Monarch & PixelFly based MLP layer efficiency testing

zhujiem commented 1 year ago

Here I post some efficiency testing numbers for Monarch based MLP v.s. vanilla nn.Linear based MLP. I found that Monarch is best suitable for MLPs in Transformer architectures, which generally have large hidden size and batch size. In recommendation-focused MLPs, the MLP is usually small (e.g., 10000x1024x512, the first is feature input dim) and importantly a small batch size (say 10) is often used for serving given concurrent online requests. The following testing numbers are provided as a reference for anyone who has similar tasks.

	Train(Fwd+Bwd)	Test(Fwd only)
Batch_size=1000	GPU-P100	GPU-P100	CPU
MLP(10000x1024x512)	2.95ms	0.16ms	26.57ms
Monarch(nblk=4)	1.85ms	0.57ms	10.29ms
Monarch(nblk=16)	1.37ms	0.55ms	5.67ms

Batch_size=10
MLP(10000x1024x512)	0.48ms	0.13ms	0.59ms
Monarch(nblk=4)	1.34ms	0.54ms	1.16ms
Monarch(nblk=16)	1.31ms	0.52ms	1.37ms

Batch_size=10000
MLP(1024x1024x512)	4.86ms	0.13ms	46.99ms
Monarch(nblk=4)	6.87ms	0.53ms	47.55ms
Monarch(nblk=16)	6.04ms	0.51ms	39.66ms

Batch_size=1000
MLP(1024x1024x512)	0.74ms	0.16ms	5.35ms
Monarch(nblk=4)	1.42ms	0.53ms	4.17ms
Monarch(nblk=16)	1.38ms	0.52ms	3.84ms

Batch_size=10
MLP(1024x1024x512)	0.46ms	0.13ms	0.27ms
Monarch(nblk=4)	1.29ms	0.53ms	1.15ms
Monarch(nblk=16)	1.27ms	0.51ms	0.84ms

I will post the numbers for pixelfly later.

leoozy commented 1 year ago

Could you please share your evaluation code for monarch? Thank you!

zhujiem commented 1 year ago

https://gist.github.com/justheuristic/9e4fb81381451a4bc8cbfee0a5100eba I reuse the code from this script. Just change from pixelfly import PixelflyLinear to MonarchLinear, which is the offical implementation.

abhishektyaagi commented 8 months ago

Hi @zhujiem , May I ask if you were able to run the training script? And if you were able to, what does your environment look like?

HazyResearch / fly

Monarch & PixelFly based MLP layer efficiency testing #11