Open AkihikoWatanabe opened 2 months ago
By burning the transformer architecture into our chip, we can’t run most traditional AI models: the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion 2. We can’t run CNNs, RNNs, or LSTMs either.
transformer以外の大抵のモデルでは動作しないが、代わりにH-100よりも20倍早いinferenceを実現できるチップらしい。
With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs.
いやいやいやLlama-70Bで0.5M Token/secは早すぎる!!!
https://www.etched.com/announcing-etched