Sohu, etched, 2024.06 - Githubissues

By burning the transformer architecture into our chip, we can’t run most traditional AI models: the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion 2. We can’t run CNNs, RNNs, or LSTMs either.

transformer以外の大抵のモデルでは動作しないが、代わりにH-100よりも20倍早いinferenceを実現できるチップらしい。

With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs.

いやいやいやLlama-70Bで0.5M Token/secは早すぎる！！！

AkihikoWatanabe / paper_notes

Sohu, etched, 2024.06 #1399