Amshaker / SwiftFormer

[ICCV'23] Official repository of paper SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
246 stars 25 forks source link

SwiftFormer meets Android #14

Open escorciav opened 8 months ago

escorciav commented 8 months ago

As mentioned in #13 , I forked the project to bring SwiftFormer onto Android (in Qualcomm hardware).

As of today, the performance of a single block as it's not encouraging under 2.2 msec. Gotten in S23 Utral S8G2 with QNN 2.16, details here

escorciav commented 8 months ago

Update. The results were so discouraging that I had to benchmark SwiftFormer_L1 (as in the paper?). The results with S23 Utral S8G2 with QNN 2.16 are worse than in the iPhone. But, perhaps decent, under 2.7 msec.

Amshaker commented 8 months ago

Thank you for the update.

Could you kindly conduct benchmark tests for both (MobileViT or MobileViT2×1) in addition to EfficientFormer_L1? I understand that we may not achieve the exact performance on the S23 (Ultra) as observed on the iPhone 14 (Pro Max) due to variations in hardware.

Please note that EfficientFormer_L1 has demonstrated comparable speed to SwiftFormer_L1 on the iPhone 14 (Pro Max). If you manage to replicate EfficientFormer_L1 on the S23 Ultra with a runtime of 2.63 msec, it suggests that the ANE of the iPhone 14 Pro Max is faster than the GPU or ANE on the S23 Ultra. If EfficientFormer_L1 significantly outperforms SwiftFormer_L1, it may indicate that the activations, normalization, and certain layers of SwiftFormer_L1 are not optimized for the S23 Ultra. This could mean that SwiftFormer requires additional optimization for optimal performance on this hardware.

I would appreciate your thoughts on this proposed plan.

Thank you.

escorciav commented 8 months ago

Agree, SwiftFormer_L1 (PytTorch implementation) + QNN 2.16 (+ my way of porting) may be leaving room for optimization.

:wink: I will leave it to someone else as:

  1. I'm kinda happy with the runtime,
  2. I'm not interested in the architectures mentioned above atm :laughing:
  3. Qualcomm does not pay my bills :upside_down_face: ( for optimizing 3rdparty models on their hardware)

Perhaps add/edit your message with the relevant links for those arch :blush:

Amshaker commented 8 months ago

I can do that soon and will update you 😄

I would be grateful if you could provide details on the steps or requirements involved in measuring the inference time on the S23 Ultra. For iOS, Apple has introduced a valuable feature in their IDE (Xcode 14) that allows for the measurement of prediction time, load time, and compilation time. Could you please share this information or update the forked repository with these specific details on Android? I am following your repo and already checked the export file.

escorciav commented 8 months ago

There are multiple ways to port a ML model on Android :blush:. Feel free to rename the issue accordingly. I wrote it in that way for marketing reasons :wink:

My approach is specific to Qualcomm hardware using QNN.

  1. My fork has the script used to export onto ONNX
  2. Then, it's just the QNN pipeline.
    1. conversion to cpp
    2. model library generation
    3. (optional, yet recommended for fast inference & speed up trials) context (aka npu/dsp/gpu) library generation
  3. profiling & execution of binaries from step 2

I'm preparing a tutorial for other folks in my org. I will share the slides later in Q1/Q2.

escorciav commented 8 months ago

Attaching the latency results,

The JSON files were generated with an internal/private tool. However, QNN docs provide all the info to parse the binary with the profiling results from step 3. The TXT-file was generated by a tiny wrapper digesting the JSON.

report_ops.txt model.iters-100.qnn.int8.json model.iters-100.qnn.int8_basic.json

escorciav commented 8 months ago

(perhaps) Good news, the latency of the block that I'm interested in improving got a speed-up of 1.27x by using QNN >= 2.17

With enough :star:s on my fork, I may be persuaded to benchmark SwiftFormer L1 :blush: :rofl:

Amshaker commented 8 months ago

That's great! 🚀 You have one star now, come on! 🤣

If you benchmarked SwiftFormer models (Let's say L1), we can do a pull request and I will add you as a contributor to the main repo with a special shoutout in the acknowledgments 👀. Isn't it a good deal? 🤣

escorciav commented 8 months ago

Push latency performance of SwiftFormerL1 with QNN 2.17 & 2.18. Improvement is as much 1.16x

we can do a pull request and I will add you as a contributor to the main repo with a special shoutout in the acknowledgments 👀. Isn't it a good deal?

Done with 80% of my duties. Awaiting instruction for the 20% & collecting the brownie points mentioned earlier :cookie:

Amshaker commented 8 months ago

You have my word on it 💯. Here we go!

Please create a pull request to the readme file of the main repo with the following change: Create a new sub-section under "Latency Measurement" named as SwiftFormer meets Android (I liked the name). With this section, you can add the two tables (SwiftFormer Encoder & SwiftFormer-L1) for the latency measurements with the variants of QNN (Feel free to add the scripts as well). Then, I will check & merge the pull request and you will automatically added as a contributor! 🚀. Following this, I'll update the acknowledgment, earning you a well-deserved second brownie 🍪

escorciav commented 8 months ago

Thanks for merging 🥰. Let's keep the issue for 6-12 months in case someone else is interested in improving runtime performance, or exploring other porting avenue for Android 😉