Minor questions about the paper and code

Thanks a lot for the interesting work! I am really enjoying reading the paper and the code. I actually have two minor questions. It will be really appreciated if any hints can be provided:

I notice that, in Section 5.1, Pixelfly is only applied on the projection step of Attention and MLP, without sparsifying the attention matrix (score matrix). While in T2T-Vit, the Pixelfly is only applied on the attention matrix without sparsifying MLP and projection. Are there any reasons for this? Also, are there any experimental results if Pixelfly is applied on all layers (MLP and attention matrix)?
I saw there are many options for /model/t2tattn_cfg with T2T-Vit, such as sblocal, performer. It seems like sblocal uses sparse + low rank. Maby I know which one should I choose if I want to use flat butterfly + low rank?
In the experiment folder under config, it seems like only the scripts for MLP-mixer, T2T-vit are provided. Do you have plans to release all scripts for other experiments? such as Vit, GPT etc......

HazyResearch / fly

Minor questions about the paper and code #1