forhaoliu / ringattention

Transformers with Arbitrarily Large Context
Apache License 2.0
625 stars 48 forks source link

Questions about the paper #14

Open hiroshinoji opened 7 months ago

hiroshinoji commented 7 months ago

First, great work! I read the paper and had a few questions.

ZhaiFeiyue commented 6 months ago

maybe is the 6 = HBM BW / Interconnect BW

forhaoliu commented 6 months ago

6 comes from storing key-value from previous host, key-value for current computation, and current query and output, so in total 2x2+1+1=6.