Open pat-alt opened 8 months ago
There is an output_hidden_states
configuration that can be set up with HGFConfig
:
model_name = "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"
cfg = HuggingFace.HGFConfig(load_config(model_name); output_hidden_states = true)
mod = load_model(model_name, "ForSequenceClassification"; config = cfg)
then you can access all layer outputs with mod(a).outputs
which is a NTuple{number_layers, @NamedTuple{hidden_state::Array{Float32, 3}}
. Another similar configuration is output_attentions
that would also include the attentions scores in the named tuples in .outputs
.
BTW, if you don't need the sequence classification head, you can simply use load_model(model_name; config = cfg)
which would extract the model part without the classification layers.
Amazing, thanks very much for the quick response 👍🏽
(I won't close this since you added the tag for documentation)
Small follow-up question: is it also somehow possible to collect outputs for each layer of the classifier head?
Edit: I realize I can just break down the forward pass into layer-by-layer calls as below, but perhaps there's a more streamline way to do this?
b = clf.layer.layers[1](b).hidden_state |>
x -> clf.layer.layers[2](x)
You can try extracting the actual layers in the classifier head and construct a Flux.Chain
and call with Flux.activations
. Otherwise, I think a manual loop/calls is probably the simplest.
Thanks for the great package @chengchingwen 🙏🏽
I have a somewhat naive question that you might be able to help me with. For a project I'm currently working on I am trying run linear probes on layer activations. In particular, I'm trying to reproduce the following exercise from this paper:
I've naively tried to simply apply the
Flux.activations()
function with no luck. Here's an example:Any advice would be much appreciated!