compiling huggingface transformer pipeline

subhamkhemka commented 3 years ago

Hi

I want to run a zero shot classification task. I am using the huggingface transformers pipeline for this task.

from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="typeform/distilbert-base-uncased-mnli")
classifier(sequence_to_classify, candidate_labels)

How do I use this pipeline to accelerate inference with torch_neuron ?

I have read the docs but not sure of how I would complile and load the model when directly referencing a transformers pipeline.

Kindly help.

Thanks, Subham

aws-zejdaj commented 3 years ago

Hi Subham,

You can find a sample of how to compile and reference a Neuron model on a HuggingFace pipeline on our Tensorflow2 Tutorial ( https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/huggingface_bert/huggingface_bert.html ). You can use the same strategy for defining the compiled model on PyTorch. Please let us know if this answers your question!

subhamkhemka commented 3 years ago

Hi

Getting an error when deploying the model in the sample you have shared

#now we can insert the neuron_model and replace the cpu model
#so now we have a huggingface pipeline that uses and underlying neuron model!
neuron_pipe.model = neuron_model
neuron_pipe.model.config = pipe.model.config

#Now let's run inference on neuron!
neuron_pipe('I want this sentence to be negative to show a negative sentiment analysis.')

UnavailableError: 2 root error(s) found.
  (0) Unavailable:  grpc server unix:/run/neuron.sock is unavailable. Please check the status of neuron-rtd service by `systemctl is-active neuron-rtd`. If it shows `inactive`, please install the service by `sudo apt-get install aws-neuron-runtime`. If `aws-neuron-runtime` is already installed, you may activate neuron-rtd service by `sudo systemctl restart neuron-rtd`.
     [[node neuron_op_10d5affb7a47741c (defined at /home/ubuntu/neuron_tf2_env/lib/python3.6/site-packages/tensorflow_neuron/python/_trace.py:456) ]]
  (1) Unavailable:  grpc server unix:/run/neuron.sock is unavailable. Please check the status of neuron-rtd service by `systemctl is-active neuron-rtd`. If it shows `inactive`, please install the service by `sudo apt-get install aws-neuron-runtime`. If `aws-neuron-runtime` is already installed, you may activate neuron-rtd service by `sudo systemctl restart neuron-rtd`.
     [[node neuron_op_10d5affb7a47741c (defined at /home/ubuntu/neuron_tf2_env/lib/python3.6/site-packages/tensorflow_neuron/python/_trace.py:456) ]]
     [[neuron_op_10d5affb7a47741c/_6]]
0 successful operations.
0 derived errors ignored. [Op:__inference_pruned_9321]

Function call stack:
pruned -> pruned

tried running below cmds

sudo apt-get install aws-neuron-runtime
Reading package lists... Done
Building dependency tree       
Reading state information... Done
aws-neuron-runtime is already the newest version (1.6.5.0).
The following packages were automatically installed and are no longer required:
  libaio1 librados2 librbd1
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 25 not upgraded.

systemctl is-active neuron-rtd
inactive

sudo systemctl restart neuron-rtd

systemctl is-active neuron-rtd
inactive

I am running this using the Python (Neuron TensorFlow 2) Kernel

Please help

Thanks, Subham

aws-zejdaj commented 3 years ago

It's possible the runtime is not starting because the driver is not active. Do you mind uninstalling and re-installing the aws-neuron-dkms (driver) package on your instance? Updates to the Linux kernel require reinstallation of our aws-neuron-dkms package, and this is the most likely issue. If re-installing the driver does not fix the problem, please post the output of these commands to help us debug the issue further: lsmod | grep neuron sudo systemctl status neuron-rtd Additional details on this debug topic can be found here along with other potentially useful troubleshooting info related to the runtime: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-runtime/nrt-troubleshoot.html#neuron-services-fail-to-start

subhamkhemka commented 3 years ago

Wasn't able to fix the issue in existing ami so I have taken a fresh instance with Ubuntu DLAMI version 48 and run the sentiment analysis example works fine now.

Getting an error when replicating this for zero shot classification task Updated to latest version of transformers

from transformers import pipeline
import tensorflow as tf
import tensorflow.neuron as tfn

model_name = 'facebook/bart-large-mnli'

pipe = pipeline('zero-shot-classification', model=model_name, framework='tf')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-4c9dfcd1ef90> in <module>
      5 model_name = 'facebook/bart-large-mnli'
      6 
----> 7 pipe = pipeline('zero-shot-classification', model=model_name, framework='tf')

~/neuron_tf2_env/lib/python3.6/site-packages/transformers/pipelines/__init__.py in pipeline(task, model, config, tokenizer, feature_extractor, framework, revision, use_fast, use_auth_token, model_kwargs, **kwargs)
    433         revision=revision,
    434         task=task,
--> 435         **model_kwargs,
    436     )
    437 

~/neuron_tf2_env/lib/python3.6/site-packages/transformers/pipelines/base.py in infer_framework_load_model(model, config, model_classes, task, framework, **model_kwargs)
    141 
    142         if isinstance(model, str):
--> 143             raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.")
    144 
    145     framework = "tf" if model.__class__.__name__.startswith("TF") else "pt"

ValueError: Could not load model facebook/bart-large-mnli with any of the following classes: (<class 'transformers.models.auto.modeling_tf_auto.TFAutoModelForSequenceClassification'>,).

Any suggestions ?

aws-zejdaj commented 3 years ago

Our runtime team experts are looking into the issue. In the meantime could you please post your Inf1 instance id, start and stop the instance, the rerun? If the issue persists please reach to us directly by e-mail at aws-neuron-support@amazon.com

subhamkhemka commented 3 years ago

We have raised a support case with our TAM, will update the issue here once we here from them.

Thanks, Subham

mrnikwaws commented 3 years ago

This topic has been picked up via the customer's account manager, however for other customers I wanted to share some sample code (prepared by another member of the team) which may help other customers:

from transformers import pipeline
import tensorflow as tf
import tensorflow.neuron as tfn
import time

#model_name = 'facebook/bart-large-mnli'
# 'typeform/distilbert-base-uncased-mnli' is unsupported 
# (see SUPPORTED_TASKS in https://huggingface.co/transformers/_modules/transformers/pipelines.html)
#model_name = 'typeform/distilbert-base-uncased-mnli'
# Choosing supported model for 'zero-shot-classification' task
model_name = 'roberta-large-mnli'
pipe = pipeline('zero-shot-classification', model=model_name, framework='tf')

sequence_to_classify = "one day I will see the world"

# 52 labels (classes)
# 1 million sequences 
# 128 seqlen (varies 5 to 128)

# stats for typeform/distilbert-base-uncased-mnli
# g4dn.2xlarge: 5 minutes 24 sec for 10k sequences
# p2.xlarge: 11 minutes 42 sec for 10k sequences
candidate_labels = ['travel', 'cooking', 'dancing']

start = time.time()
print(pipe(sequence_to_classify, candidate_labels))
print("CPU infer time: ", time.time() - start)

# On g4 machine, DLAMI v48, pytorch_p36
#(pytorch_p36) ubuntu@ip-172-31-6-163:~/github$ python test.py
#{'sequence': 'one day I will see the world', 'labels': ['travel', 'dancing', 'cooking'], 'scores': [0.9938651323318481, 0.003273785812780261, 0.002861040411517024]}

neuron_pipe = pipeline('zero-shot-classification', model='roberta-large-mnli', framework='tf')

#the first step is to modify the underlying tokenizer to create a static
#input shape as inferentia does not work with dynamic input shapes
original_tokenizer = pipe.tokenizer

#we intercept the function call to the original tokenizer
#and inject our own code to modify the arguments
def wrapper_function(*args, **kwargs):
    kwargs['padding'] = 'max_length'
    #this is the key line here to set a static input shape
    #so that all inputs are set to a len of 128
    kwargs['max_length'] = 128
    kwargs['truncation'] = True
    kwargs['return_tensors'] = 'tf'
    return original_tokenizer(*args, **kwargs)

#insert our wrapper function as the new tokenizer as well
#as reinserting back some attribute information that was lost
#when we replaced the original tokenizer with our wrapper function
neuron_pipe.tokenizer = wrapper_function
neuron_pipe.tokenizer.decode = original_tokenizer.decode
neuron_pipe.tokenizer.mask_token_id = original_tokenizer.mask_token_id
neuron_pipe.tokenizer.pad_token_id = original_tokenizer.pad_token_id
neuron_pipe.tokenizer.convert_ids_to_tokens = original_tokenizer.convert_ids_to_tokens

#Now that our neuron_classifier is ready we can use it to
#generate an example input which is needed to compile the model
#note that pipe.model is the actual underlying model itself which
#is what Tensorflow Neuron actually compiles.
example_inputs = neuron_pipe.tokenizer('we can use any string here to generate example inputs')
#compile the model by calling tfn.trace by passing in the underlying model
#and the example inputs generated by our updated tokenizer
start = time.time()
neuron_model = tfn.trace(pipe.model, example_inputs)
print("Neuron compile time: ", time.time() - start)
#saved_model_dir = './neuron-' + model_name
#neuron_model.save(saved_model_dir)
#tf.keras.models.load_model(saved_model_dir)

#now we can insert the neuron_model and replace the cpu model

#so now we have a huggingface pipeline that uses and underlying neuron model!
neuron_pipe.model = neuron_model
neuron_pipe.model.config = pipe.model.config

start = time.time()
print(neuron_pipe(sequence_to_classify, candidate_labels))
print("Neuron infer time: ", time.time() - start)

vsuman-ontic commented 1 year ago

While saving the model in local disk :: neuron_model.save(saved_model_dir) throw error:: tensorflow.python.saved_model.nested_structure_coder.NotEncodableError

Error stack trace: File “<stdin>“, line 1, in <module> File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py”, line 2132, in save save.save_model(self, filepath, overwrite, include_optimizer, save_format, File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py”, line 150, in save_model saved_model_save.save(model, filepath, overwrite, include_optimizer, File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/keras/saving/saved_model/save.py”, line 89, in save saved_nodes, node_paths = save_lib.save_and_return_nodes( File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py”, line 1268, in save_and_return_nodes _build_meta_graph(obj, signatures, options, meta_graph_def)) File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py”, line 1441, in _build_meta_graph return _build_meta_graph_impl(obj, signatures, options, meta_graph_def) File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py”, line 1405, in _build_meta_graph_impl object_graph_proto = _serialize_object_graph(saveable_view, File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py”, line 967, in _serialize_object_graph serialized = function_serialization.serialize_concrete_function( File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/saved_model/function_serialization.py”, line 73, in serialize_concrete_function nested_structure_coder.encode_structure( File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py”, line 103, in encode_structure return _map_structure(nested_structure, _get_encoders()) File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py”, line 85, in _map_structure return do(pyobj, recursion_fn) File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py”, line 188, in do_encode encoded_tuple.tuple_value.values.add().CopyFrom(encode_fn(element)) File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py”, line 85, in _map_structure return do(pyobj, recursion_fn) File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py”, line 188, in do_encode encoded_tuple.tuple_value.values.add().CopyFrom(encode_fn(element)) File “/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow2_p38/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py”, line 86, in _map_structure raise NotEncodableError( tensorflow.python.saved_model.nested_structure_coder.NotEncodableError: No encoder for object {‘input_ids’: TensorSpec(shape=(None, 128), dtype=tf.int32, name=‘input_ids’), ‘attention_mask’: TensorSpec(shape=(None, 128), dtype=tf.int32, name=‘attention_mask’)} of type <class ‘transformers.tokenization_utils_base.BatchEncoding’>.

@aws-taylor , Please can you help us in resolving this.

Thank you, VS

aws-neuron / aws-neuron-sdk

compiling huggingface transformer pipeline #317