-
Dear Xiaoyuan Zhang,
I am very interested in your project. This year, my research group published a paper titled "A Hyper-Transformer Model for Controllable Pareto Front Learning with Split Feasibi…
-
![5D190FA6EE718064BEC8DBD812DCF1B3](https://github.com/user-attachments/assets/f7fd1920-6046-46e7-9162-f6b30ee15a8e)
I downloaded siglip-so400m-patch14-384 and write down the path. What else do I n…
-
I read in the paper that you also used this method on a T5 variant model. How did you treat the embedding layer and the output layer of the model?
-
I would like to record some model activations in an architecture-invariant way.
In PyTorch, we can use [forward hooks](https://pytorch.org/docs/stable/generated/torch.nn.modules.module.register_modul…
-
My model is
```json
{
"mlp_bias": false,
"attn_bias": false,
"rotary_base": 300000,
"rotary_scaling": null,
"residual_mlp": false,
"disable_weight_only_quant_plugin": false,
…
-
报错内容:
```
Traceback (most recent call last):
File "./graphgpt/eval/run_graphgpt.py", line 244, in
run_eval(args, args.num_gpus)
File "./graphgpt/eval/run_graphgpt.py", line 98, in run_ev…
-
Why does the modified transformer model and transformer model have the same code in the timing prediction folder?
-
**Description**
Hi, I am trying to test the model with original settings but met this issue. It seems there could be something wrong with te embedding model, but I had no idea. This problem occurs ei…
-
When I use stories15M and stories110M I got an error.
```
File "D:\_LLM_project\Development\gpt-fast\generate.py", line 114, in speculative_decode
torch.cat([cur_token.view(1), draft_tokens])…
-
### Issue Type
Documentation Bug
### Source
source
### Keras Version
2.14
### Custom Code
Yes
### OS Platform and Distribution
Ubuntu 22.04
### Python version
3.10
…