Closed shimoshida closed 2 years ago
Do you get correct result under FP32? I try to repoduce your problem but fail. By using your promt, I get the following ouptut on both FP32 and FP16 when output length is set to 64.
The Belgian national football team is the official name of a selection made by Belgium's Football Federation (, , ) to play in international matches. The current head coach and manager are Roberto Martínez who took over from Marc Wilmots on 1 June 2019 after he was sacked following their exit at Euro 2020 qualifying Group A stage
I don't use the FP16 checkpoint because the converter will convert it back to FP32.
@byshiue Thank you for your quick response!
Do you get correct result under FP32?
I am trying to test accuracy using FP32, but my objective is to use FP16 checkpoint because
some huggingface repository provides only FP16 for some models, e.g., https://huggingface.co/NovelAI/genji-jp/tree/main
.
I don't use the FP16 checkpoint because the converter will convert it back to FP32.
Yes, I am aware that tensors are converted to FP32 before save. I also tried to save tensors in FP16, but FasterTransformers cannot load such tensors...
modiry savebin function
def savebin(param, save_path):
if isinstance(param, torch.Tensor):
param = param.cpu().float().numpy()
np.squeeze(param).astype(np.float16).tofile(save_path + ".bin") # here
after that, loading model fails!
warnings raise
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.wte.bin only has 412876800, but request 825753600, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.final_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.final_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.lm_head.weight.bin only has 412876800, but request 825753600, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.lm_head.bias.bin only has 100800, but request 201600, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.0.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.0.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.0.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.0.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.0.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.0.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.0.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.0.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.1.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.1.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.1.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.1.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.1.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.1.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.1.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.1.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.2.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.2.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.2.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.2.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.2.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.2.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.2.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.2.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.3.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.3.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.3.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.3.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.3.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.3.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.3.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.3.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.4.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.4.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.4.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.4.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.4.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.4.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.4.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.4.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.5.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.5.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.5.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.5.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.5.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.5.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.5.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.5.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.6.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.6.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.6.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.6.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.6.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.6.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.6.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.6.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.7.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.7.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.7.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.7.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.7.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.7.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.7.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.7.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.8.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.8.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.8.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.8.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.8.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.8.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.8.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.8.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.9.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.9.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.9.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.9.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.9.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.9.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.9.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.9.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.10.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.10.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.10.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.10.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.10.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.10.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.10.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.10.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.11.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.11.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.11.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.11.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.11.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.11.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.11.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.11.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.12.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.12.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.12.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.12.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.12.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.12.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.12.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.12.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.13.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.13.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.13.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.13.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.13.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.13.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.13.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.13.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.14.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.14.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.14.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.14.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.14.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.14.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.14.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.14.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.15.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.15.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.15.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.15.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.15.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.15.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.15.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.15.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.16.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.16.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
root@579c52e6ce01:/workspace# [WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.16.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.16.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
root@579c52e6ce01:/workspace# [WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.16.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.16.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.16.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.16.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.17.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.17.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.17.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
root@579c52e6ce01:/workspace# [WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.17.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.17.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.17.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.17.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.17.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.18.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.18.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.18.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.18.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.18.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.18.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.18.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.18.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.19.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.19.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.19.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.19.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.19.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.19.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.19.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.19.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.20.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.20.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.20.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.20.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.20.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.20.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.20.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.20.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.21.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.21.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.21.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.21.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.21.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.21.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.21.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.21.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.22.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.22.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.22.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.22.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.22.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.22.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.22.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.22.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.23.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.23.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.23.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.23.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.23.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.23.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.23.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.23.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.24.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.24.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.24.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.24.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.24.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.24.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.24.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.24.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.25.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.25.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.25.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.25.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.25.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.25.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.25.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.25.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.26.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.26.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.26.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.26.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.26.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.26.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.26.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.26.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.27.input_layernorm.bias.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.27.input_layernorm.weight.bin only has 8192, but request 16384, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.27.attention.query_key_value.weight.0.bin only has 100663296, but request 201326592, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.27.attention.dense.weight.0.bin only has 33554432, but request 67108864, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.27.mlp.dense_h_to_4h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.27.mlp.dense_h_to_4h.bias.0.bin only has 32768, but request 65536, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.27.mlp.dense_4h_to_h.weight.0.bin only has 134217728, but request 268435456, loading model fails!
[WARNING] file /workspace/triton-model-store/fastertransformer/1/gpt-j-6b/model.layers.27.mlp.dense_4h_to_h.bias.bin only has 8192, but request 16384, loading model fails!
You mean that FasterTransformer convert FP32 parameters to FP16 dynamically when FP16 inference mode? If so, FasterTransformer does not support FP16 checkpoint?
Yes. FT assumes that the checkpoint is always under FP32. If you set is_half=1
, FT will convert the model to FP16 during loading model.
As I know, the model you provide above contains both FP32 and FP16 weights, we can test them first to make sure that the additional casting during convertion does not affect the result.
If the casting in the convertion may affect the results, you can try to modify the loading function loadWeightFromBin
in memory_utils.h
to load the FP16 weight.
@byshiue Sure. I'll try to modify memory_utils.h
.
btw, I have finished testing accuracy with FP32 checkpoint, but failed...
I used the following FP32 checkpoint, rename model.pt
, and follows the procedure mentioned above.
Generated: The Belgian national football team is the national football team of Belgium. It is controlled by the Belgian Football Association of the Belgian Football Association of Football Association (Federation of Football Federation (Federation of Wallonia, the Belgian Football Association (Federation (Federation (Federation (Federation (Federation) and the Belgian Football Association (F
Which weights are needed to generate the sentences you generated?
What's your meaning of "failed"? Cannot conver the model or cannot generate correct result? I follow the gptj_guide.md to donwload and convert the model.
@byshiue "failed" means that generated sentences are not correct like here.
Generated: The Belgian national football team is the national football team of Belgium. It is controlled by the Belgian Football Association of the Belgian Football Association of Football Association (Federation of Football Federation (Federation of Wallonia, the Belgian Football Association (Federation (Federation (Federation (Federation (Federation) and the Belgian Football Association (F
I follow the gptj_guide.md to donwload and convert the model.
Oh, I see. If you don't mind, could you please try the reproduction method I described? FasterTransformer doesn't seem to work well if you simply convert the Chechpoint provided by Huggingface.
Sorry, I misunderstand something. As you say, the converter of FT does not support the checkpoint of Huggingface. So, if you want to load the model of Huggingface, you need to modify the converter. Thus, I think it is not a problem for precision, but the converting.
GPT-J also has a different architecture to vanilla decoder transformer. Does FasterTransformer support different architectures too?
FT supports GPT-J, standard encoder-decoder, BERT, longformer and T5.
@byshiue
So, if you want to load the model of Huggingface, you need to modify the converter. Thus, I think it is not a problem for precision, but the converting.
The difference between huggingface and original gpt-j-6b is just layer names, right?
I have checked differences of values after converting parameters in *.bin
format and confirmed that both values are almost same.
huggingface: [-0.00404739 0.01963806 -0.00400543 -0.00257874 -0.00688553 0.02464294 0.01841736 -0.02111816 0.0171814 -0.00888824] original: [-0.00405884 0.01960754 -0.00401306 -0.00258636 -0.0068779 0.0246582 0.01843262 -0.02108765 0.01715088 -0.00888824]
You mean that the following conversion script from *.pt
to *.bin
is wrong?
Name mapping is successful by using the mentioned script.
The difference between huggingface and original gpt-j-6b is just layer names, right? I don't know. If you think the difference is just layer name, you can try
- Run the original gpt-j by FT to verify the correctness. (We have tested this case and it should work)
- Convert the huggingface gpt-j to original gpt-j format by name mapping and verify both checkpoints are same. If 1 and 2 are correct, then you should be able to run huggingface gpt-j.
Hi, can you try the tag dev/v5.0_beta_2021.09_tag?
@byshiue
Thank you for the information!
As you mentioned, I have obtained accurate outputs by using dev/v5.0_beta_2021.09_tag
.
Thank you for your help!
Sorry, it seems there are some bugs in the latest codes, I will fix it as soon as possible.
Information
I want to perform GPT-J model in fp16 precision(https://huggingface.co/EleutherAI/gpt-j-6B/tree/float16) on FasterTransformer + Triton, but I have a trouble with the accuracy. For example, the following sentences are generated when following sample scripts with FasterTransformer.
sample scripts https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/GPT-J-6B/Inference_with_GPT_J_6B.ipynb#scrollTo=RdOynYcY8jb1
generated sentences
Environment
To reproduce
download gpt-j-6b model in float 16
pytorch_model.bin
, renamegpt-j.pt
, and storegpt-j/gpt-j.pt
. https://huggingface.co/EleutherAI/gpt-j-6B/tree/float16convert pytorch model to fasterTransformer format via the following scripts on docker image
nvcr.io/nvidia/pytorch:21.07-py3
.config.pbtxt
totriton-model-store/fastertransformer/config.pbtxt
. I note that I have changedtemperature
to0.9
from sample GPT-J config.build
run (NOTE that current directory includes
triton-model-store
directory)In the above image, run Triton
run the following
chat.py
viapython3 chat.py
Expected Behavior
The following reference outputs accurate sentences.
Ref: https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/GPT-J-6B/Inference_with_GPT_J_6B.ipynb#scrollTo=RdOynYcY8jb1
output:
Related Issue
NVIDIA/FasterTransformer#172