The context length for Qwen2-57B-A14B is 32k, but the default setting of max_position_embeddings and slide_window is 131072 in the config.json seems to be incorrect. In comparison, for Qwen2-57B-A14B-Instruct, the same setting is 32768, which appears to be more appropriate.
The context length for Qwen2-57B-A14B is 32k, but the default setting of
max_position_embeddings
andslide_window
is 131072 in the config.json seems to be incorrect. In comparison, for Qwen2-57B-A14B-Instruct, the same setting is 32768, which appears to be more appropriate.links: https://huggingface.co/Qwen/Qwen2-57B-A14B/blob/main/config.json