-
Hello, Mistral Team!
Congrats on open-sourcing your model and thanks a lot for your work! Being inspired by the memory- and compute-efficiency and benchmark performance of your model, I tried to re…
-
Really grateful for your work. I have a question about your experiments table2. are the last two columns directly using cogvlm and blip2 to do the vqa task? if so, `cogvlm vqa` can get the best result…
-
In [modify_llama.py](https://github.com/FMInference/H2O/blob/main/h2o_hf/utils_real_drop/modify_llama.py), the hh_score of H2OCache is computed by attn_scores.sum(0).sum(1), resulting in a shape of [n…
-
* Name of dataset: General Question Answering (GQA)
* URL of dataset: https://cs.stanford.edu/people/dorarad/gqa/
* License of dataset: https://creativecommons.org/licenses/by/4.0/
* Short descri…
-
Hello, first of all, thank you very much for providing the code. I have two questions:
1. Could you please let me know if the code mentioned in the description for generating annotations for other d…
-
Hi, have you tried implementing the Chai structure on models like LLaMA2 or any other models besides LLaMA and OPT? Looking forward to your response, thanks.
-
Dear author, now if I want to add a GQA dataset for training, what do I need to do exactly?
-
https://github.com/QwenLM/Qwen2/issues/259
qwen1.5测出的问题在qwen2仍然存在,出问题的模型应该都用了GQA
-
I am using the given SAS token to extract this YAML file
> coco_flickr30k_googlecc_gqa_sbu_oi_x152c4big2exp168.yaml
`./azcopy copy "https://biglmdiag.blob.core.windows.net/vinvl/pretrain_corpu…
-
May I ask if this tool is currently unable to perform pruning on GQA models? Llama2-70B or Llama3