Closed lamashnikov closed 1 month ago
After digging a little it seems that quantized models have a lot more information than """"normal"""" models, including some meta-data such as the type , and safe-tensors serialization wasn't expecting such meta-data. Seems that the name is also changing from quantized to not quantized
quantized state dict: q_model state dict: <class 'torch.dtype'> 0.auto_model.embeddings.word_embeddings._packed_params.dtype <class 'torch.Tensor'> 0.auto_model.embeddings.word_embeddings._packed_params._packed_weight <class 'torch.dtype'> 0.auto_model.embeddings.position_embeddings._packed_params.dtype <class 'torch.Tensor'> 0.auto_model.embeddings.position_embeddings._packed_params._packed_weight <class 'torch.dtype'> 0.auto_model.embeddings.token_type_embeddings._packed_params.dtype <class 'torch.Tensor'> 0.auto_model.embeddings.token_type_embeddings._packed_params._packed_weight <class 'torch.Tensor'> 0.auto_model.embeddings.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.embeddings.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.query.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.query.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.0.attention.self.query._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.0.attention.self.query._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.key.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.key.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.0.attention.self.key._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.0.attention.self.key._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.value.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.value.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.0.attention.self.value._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.0.attention.self.value._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.0.attention.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.0.attention.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.intermediate.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.intermediate.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.0.intermediate.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.0.intermediate.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.0.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.0.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.query.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.query.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.1.attention.self.query._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.1.attention.self.query._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.key.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.key.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.1.attention.self.key._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.1.attention.self.key._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.value.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.value.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.1.attention.self.value._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.1.attention.self.value._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.1.attention.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.1.attention.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.intermediate.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.intermediate.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.1.intermediate.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.1.intermediate.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.1.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.1.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.query.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.query.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.2.attention.self.query._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.2.attention.self.query._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.key.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.key.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.2.attention.self.key._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.2.attention.self.key._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.value.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.value.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.2.attention.self.value._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.2.attention.self.value._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.2.attention.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.2.attention.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.intermediate.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.intermediate.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.2.intermediate.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.2.intermediate.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.2.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.2.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.query.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.query.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.3.attention.self.query._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.3.attention.self.query._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.key.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.key.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.3.attention.self.key._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.3.attention.self.key._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.value.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.value.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.3.attention.self.value._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.3.attention.self.value._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.3.attention.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.3.attention.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.intermediate.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.intermediate.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.3.intermediate.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.3.intermediate.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.3.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.3.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.query.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.query.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.4.attention.self.query._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.4.attention.self.query._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.key.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.key.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.4.attention.self.key._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.4.attention.self.key._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.value.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.value.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.4.attention.self.value._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.4.attention.self.value._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.4.attention.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.4.attention.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.intermediate.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.intermediate.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.4.intermediate.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.4.intermediate.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.4.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.4.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.query.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.query.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.5.attention.self.query._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.5.attention.self.query._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.key.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.key.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.5.attention.self.key._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.5.attention.self.key._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.value.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.value.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.5.attention.self.value._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.5.attention.self.value._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.5.attention.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.5.attention.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.intermediate.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.intermediate.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.5.intermediate.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.5.intermediate.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.output.dense.scale <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.output.dense.zero_point <class 'torch.dtype'> 0.auto_model.encoder.layer.5.output.dense._packed_params.dtype <class 'tuple'> 0.auto_model.encoder.layer.5.output.dense._packed_params._packed_params <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.pooler.dense.scale <class 'torch.Tensor'> 0.auto_model.pooler.dense.zero_point <class 'torch.dtype'> 0.auto_model.pooler.dense._packed_params.dtype <class 'tuple'> 0.auto_model.pooler.dense._packed_params._packed_params
not quantized state dict:
model state dict: <class 'torch.Tensor'> 0.auto_model.embeddings.word_embeddings.weight <class 'torch.Tensor'> 0.auto_model.embeddings.position_embeddings.weight <class 'torch.Tensor'> 0.auto_model.embeddings.token_type_embeddings.weight <class 'torch.Tensor'> 0.auto_model.embeddings.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.embeddings.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.query.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.query.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.key.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.key.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.value.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.self.value.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.intermediate.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.intermediate.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.0.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.query.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.query.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.key.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.key.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.value.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.self.value.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.intermediate.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.intermediate.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.1.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.query.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.query.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.key.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.key.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.value.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.self.value.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.intermediate.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.intermediate.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.2.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.query.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.query.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.key.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.key.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.value.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.self.value.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.intermediate.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.intermediate.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.3.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.query.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.query.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.key.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.key.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.value.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.self.value.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.intermediate.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.intermediate.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.4.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.query.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.query.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.key.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.key.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.value.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.self.value.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.attention.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.intermediate.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.intermediate.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.output.dense.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.output.dense.bias <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.output.LayerNorm.weight <class 'torch.Tensor'> 0.auto_model.encoder.layer.5.output.LayerNorm.bias <class 'torch.Tensor'> 0.auto_model.pooler.dense.weight <class 'torch.Tensor'> 0.auto_model.pooler.dense.bias
The place where it explode (expecting a tensor, got meta-data), in "transformers/modeling_utils.py", line 650,
for name in shared:
tensor = state_dict[name]
areas.append((tensor.data_ptr(), _end_ptr(tensor), name))
areas.sort()
Seems that this issue isn't a Sentence-Transformer issue
Dears Maintainers,
I've followed this tutorial to quantize the paraphrase-MiniLM-L6-v2 model: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/distillation/model_quantization.py And it work great, my model work well, it take much less space and i'm very happy with it.
But when i try to save it (with save_to_the_hub or save_pretrained), i have this error
Traceback (most recent call last): File "/home/censored/perso/./my_project.py", line 190, in
main()
File "/home/censored/perso/./my_project.py", line 171, in main
q_model.save_pretrained("lamashnikov/cls-quantitized-paraphrase-MiniLM-L6-v2")
File "/home/censored/.local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 1072, in save_pretrained
self.save(
File "/home/sencored/.local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 1037, in save
module.save(model_path, safe_serialization=safe_serialization)
File "/home/censored/.local/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 180, in save
self.auto_model.save_pretrained(output_path, safe_serialization=safe_serialization)
File "/home/censored/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2698, in save_pretrained
shared_names, disjoint_names = _find_disjoint(shared_ptrs.values(), state_dict)
File "/home/censored/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 650, in _find_disjoint
areas.append((tensor.data_ptr(), _end_ptr(tensor), name))
AttributeError: 'torch.dtype' object has no attribute 'data_ptr'
I'm able to save it to pickle format and restore it, but i wanted to save it to the hub so it's kind of annoying.
I can't either save it to safetensor format
save_file(q_model.state_dict(), 'model.safetensors') File "/home/censored/.local/lib/python3.10/site-packages/safetensors/torch.py", line 286, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) File "/home/censored/.local/lib/python3.10/site-packages/safetensors/torch.py", line 470, in _flatten raise ValueError(f"Key
{k}
is invalid, expected torch.Tensor but received {type(v)}") ValueError: Key0.auto_model.embeddings.word_embeddings._packed_params.dtype
is invalid, expected torch.Tensor but received <class 'torch.dtype'>Do you have any hints to make it work? Do you know what's happening? My google-fu didn't helped me there and i'm sorry to annoy you with that
Regards