HazyResearch / fly

Apache License 2.0
188 stars 21 forks source link

Monarch & PixelFly based MLP layer efficiency testing #11

Open zhujiem opened 1 year ago

zhujiem commented 1 year ago

Here I post some efficiency testing numbers for Monarch based MLP v.s. vanilla nn.Linear based MLP. I found that Monarch is best suitable for MLPs in Transformer architectures, which generally have large hidden size and batch size. In recommendation-focused MLPs, the MLP is usually small (e.g., 10000x1024x512, the first is feature input dim) and importantly a small batch size (say 10) is often used for serving given concurrent online requests. The following testing numbers are provided as a reference for anyone who has similar tasks.

Train(Fwd+Bwd) Test(Fwd only)
Batch_size=1000 GPU-P100 GPU-P100 CPU
MLP(10000x1024x512) 2.95ms 0.16ms 26.57ms
Monarch(nblk=4) 1.85ms 0.57ms 10.29ms
Monarch(nblk=16) 1.37ms 0.55ms 5.67ms
Batch_size=10
MLP(10000x1024x512) 0.48ms 0.13ms 0.59ms
Monarch(nblk=4) 1.34ms 0.54ms 1.16ms
Monarch(nblk=16) 1.31ms 0.52ms 1.37ms
Batch_size=10000
MLP(1024x1024x512) 4.86ms 0.13ms 46.99ms
Monarch(nblk=4) 6.87ms 0.53ms 47.55ms
Monarch(nblk=16) 6.04ms 0.51ms 39.66ms
Batch_size=1000
MLP(1024x1024x512) 0.74ms 0.16ms 5.35ms
Monarch(nblk=4) 1.42ms 0.53ms 4.17ms
Monarch(nblk=16) 1.38ms 0.52ms 3.84ms
Batch_size=10
MLP(1024x1024x512) 0.46ms 0.13ms 0.27ms
Monarch(nblk=4) 1.29ms 0.53ms 1.15ms
Monarch(nblk=16) 1.27ms 0.51ms 0.84ms

I will post the numbers for pixelfly later.

leoozy commented 1 year ago

Could you please share your evaluation code for monarch? Thank you!

zhujiem commented 1 year ago

https://gist.github.com/justheuristic/9e4fb81381451a4bc8cbfee0a5100eba I reuse the code from this script. Just change from pixelfly import PixelflyLinear to MonarchLinear, which is the offical implementation.

abhishektyaagi commented 8 months ago

Hi @zhujiem , May I ask if you were able to run the training script? And if you were able to, what does your environment look like?