Closed Yard1 closed 3 weeks ago
@yzh119 Please let me know if this is on the right track! I couldn't see anything directly related to the dtype of the query in the kernels, so my assumption is this should "just work", but I don't know if this will not affect eg. q_vec
loading. I am compiling it to test it right now.
Yes I do think you are on the right track, thank you!
but I don't know if this will not affect eg. q_vec loading.
I don't think so.
@yzh119 The modified unit test passes for me, can you review and validate?
@yzh119 correct, I wanted to avoid having to modify the public API. I don't think the information about the query dtype will be used in resource estimation, but please correct me if that's not the case - happy to do the change then
Hi @Yard1 , I'm a little bit conservative here because this section of code
might produce different num_blocks_per_sm
because of the difference of qtype in the kernel.
Ok sounds good! Let me make the change.
@yzh119 Updated, ptal!
Closes https://github.com/flashinfer-ai/flashinfer/issues/285
Modified unit tests pass. May need some extra validation.