chengchingwen / Transformers.jl

Julia Implementation of Transformer models
MIT License
523 stars 74 forks source link

scibert models missing loading_method? #91

Closed robertfeldt closed 1 year ago

robertfeldt commented 2 years ago

I can load all the bert models but none of the scibert ones:

julia> bert_model, wordpiece, tokenizer = pretrain"bert-uncased_L-12_H-768_A-12"
[ Info: loading pretrain bert model: uncased_L-12_H-768_A-12.tfbson 
...

julia> bert_model, wordpiece, tokenizer = pretrain"scibert-scibert_scivocab_uncased"
ERROR: unknown pretrain type
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] loading_method(x::Val{:scibert})
   @ Transformers.Pretrain ~/.julia/packages/Transformers/jtjKq/src/pretrain/Pretrain.jl:46
 [3] load_pretrain(str::String; kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Transformers.Pretrain ~/.julia/packages/Transformers/jtjKq/src/pretrain/Pretrain.jl:58
 [4] load_pretrain(str::String)
   @ Transformers.Pretrain ~/.julia/packages/Transformers/jtjKq/src/pretrain/Pretrain.jl:57
 [5] top-level scope
   @ REPL[12]:1

Seems there is no loading_method for :scibert.

robertfeldt commented 2 years ago

Ok, just realized the scibert ones are seen as bert models and can thus be loaded with the bert loading_method so this works:

bert_model, wordpiece, tokenizer = pretrain"bert-scibert_scivocab_uncased"

What lead me astray was looking at the output from the pretrains method which made me think that one should concat the 2nd and 3rd column to get the model name to send to pretrain.

julia> pretrains()
  Type model   model name                   support items                   
  –––– ––––––– –––––––––––––––––––––––––––– ––––––––––––––––––––––––––––––––
  Gpt  gpt     OpenAIftlm                   gpt_model, bpe, vocab, tokenizer
  Bert scibert scibert_scivocab_uncased     bert_model, wordpiece, tokenizer
  Bert scibert scibert_basevocab_cased      bert_model, wordpiece, tokenizer
  Bert scibert scibert_basevocab_uncased    bert_model, wordpiece, tokenizer
  Bert scibert scibert_scivocab_cased       bert_model, wordpiece, tokenizer
  Bert bert    cased_L-12_H-768_A-12        bert_model, wordpiece, tokenizer
  Bert bert    wwm_cased_L-24_H-1024_A-16   bert_model, wordpiece, tokenizer
  Bert bert    uncased_L-12_H-768_A-12      bert_model, wordpiece, tokenizer
  Bert bert    multi_cased_L-12_H-768_A-12  bert_model, wordpiece, tokenizer
  Bert bert    wwm_uncased_L-24_H-1024_A-16 bert_model, wordpiece, tokenizer
  Bert bert    multilingual_L-12_H-768_A-12 bert_model, wordpiece, tokenizer
  Bert bert    chinese_L-12_H-768_A-12      bert_model, wordpiece, tokenizer
  Bert bert    cased_L-24_H-1024_A-16       bert_model, wordpiece, tokenizer
  Bert bert    uncased_L-24_H-1024_A-16     bert_model, wordpiece, tokenizer

Possibly this could be clarified in the output and/or documentation. Sorry if I just missed it.

chengchingwen commented 2 years ago

It is written on the docstring for @pretrain_str, but I agree that might be a little misleading.