chengchingwen / Transformers.jl

Julia Implementation of Transformer models
MIT License
522 stars 74 forks source link

HuggingFace Validation not working as expected #195

Open deveshjawla opened 6 days ago

deveshjawla commented 6 days ago

Hi Peter @chengchingwen,

Could you please help with the HuggingFace Validation not working as expected?

When trying to validate other models such Roberta, Bloom etc, the Validation works fine but I have added Distilbert and it give the following error.

[ Info: Loading python packages
[ Info: Python packages loaded successfully
[ Info: Load configure file in Python
[ Info: Load configure file in Julia
[ Info: Validate distilbert/distilbert-base-cased based model
[ Info: Loading based model in Python
[ Info: Python model loaded successfully
[ Info: Loading based model in Julia
┌ Error: Failed to load the model in Julia
└ @ Main ~/Projects/Transformers.jl/example/HuggingFaceValidation/utils.jl:2
Based Model: Error During Test at /Users/456828/Projects/Transformers.jl/example/HuggingFaceValidation/based_model.jl:6
  Got exception outside of a @test
  Unknown model type: distilbert
  Stacktrace:
    [1] error(s::String)
      @ Base ./error.jl:35
    [2] _get_model_type(model_type::Symbol)
      @ Transformers.HuggingFace ~/.julia/packages/Transformers/qH1VW/src/huggingface/models/models.jl:35
    [3] macro expansion
      @ ~/.julia/packages/Transformers/qH1VW/src/huggingface/models/models.jl:40 [inlined]
    [4] #1071#default_f
      @ ~/.julia/packages/ValSplit/Qe1Uy/src/ValSplit.jl:185 [inlined]
    [5] macro expansion
      @ ~/.julia/packages/ValSplit/Qe1Uy/src/ValSplit.jl:0 [inlined]
    [6] _valswitch
      @ ~/.julia/packages/ValSplit/Qe1Uy/src/ValSplit.jl:137 [inlined]
    [7] get_model_type
      @ ~/.julia/packages/ValSplit/Qe1Uy/src/ValSplit.jl:187 [inlined]
    [8] get_model_type(model_type::Symbol, task::Symbol)
      @ Transformers.HuggingFace ~/.julia/packages/Transformers/qH1VW/src/huggingface/models/models.jl:43 

However if I load the implementation of distilbert from REPL I am able to load it.

Transformers.HuggingFace.load_model("distilbert/distilbert-base-cased", :model, state_dict; config= conf)

HGFDistilBertModel(
  Chain(
    CompositeEmbedding(
      token = Embed(768, 28996),        # 22_268_928 parameters
      position = ApplyEmbed(.+, FixedLenPositionEmbed(768, 512), Main.Transformers.HuggingFace.distilbert_pe_indices(0,)),  # 393_216 parameters
      segment = ApplyEmbed(.+, Embed(768, 512), Main.Transformers.HuggingFace.bert_ones_like),  # 393_216 parameters
    ),
    DropoutLayer<nothing>(
      LayerNorm(768, ϵ = 1.0e-12),      # 1_536 parameters
    ),
  ),
  Transformer<6>(
    PostNormTransformerBlock(
      DropoutLayer<nothing>(
        SelfAttention(
          MultiheadQKVAttenOp(head = 12, p = nothing),
          Fork<3>(Dense(W = (768, 768), b = true)),  # 1_771_776 parameters
          Dense(W = (768, 768), b = true),  # 590_592 parameters
        ),
      ),
      LayerNorm(768, ϵ = 1.0e-12),      # 1_536 parameters
      DropoutLayer<nothing>(
        Chain(
          Dense(σ = NNlib.gelu, W = (768, 3072), b = true),  # 2_362_368 parameters
          Dense(W = (3072, 768), b = true),  # 2_360_064 parameters
        ),
      ),
      LayerNorm(768, ϵ = 1.0e-12),      # 1_536 parameters
    ),
  ),                  # Total: 96 arrays, 42_527_232 parameters, 162.238 MiB.
  Branch{(:pooled,) = (:hidden_state,)}(
    BertPooler(Dense(σ = NNlib.tanh_fast, W = (768, 768), b = true)),  # 590_592 parameters
  ),
)                   # Total: 103 arrays, 66_174_720 parameters, 252.448 MiB.`
chengchingwen commented 17 hours ago

Can you check if the model correctly subtype HuggingFace.HGFPreTrained{:distilbert}?

https://github.com/chengchingwen/Transformers.jl/blob/12ed656d75aa1d8f29f44145c9fed1e76a39bebe/src/huggingface/models/models.jl#L33-L36