MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.12k stars 21.18k forks source link

Possible code wrong in documentation on model inference with Hugging Face Transformers #108859

Closed macc-n closed 1 year ago

macc-n commented 1 year ago

Hello,

I'm trying to execute the example of the NER inference pipeline reported in this article: https://learn.microsoft.com/it-it/azure/databricks/machine-learning/train-model/huggingface/model-inference-nlp

The exact code is the following:

%pip install torch transformers

import pandas as pd
from transformers import pipeline
import torch
from pyspark.sql.functions import pandas_udf

device = 0 if torch.cuda.is_available() else -1

texts = ["Hugging Face is a French company based in New York City.", "Databricks is based in San Francisco."]
df = spark.createDataFrame(pd.DataFrame(texts, columns=["text"]))

ner_pipeline = pipeline(task="ner", model="Davlan/bert-base-multilingual-cased-ner-hrl", aggregation_strategy="simple", device=device)

@pandas_udf('array<struct<word string, entity_group string, score float, start integer, end integer>>')
def ner_udf(texts: pd.Series) -> pd.Series:
  return pd.Series(ner_pipeline(texts.to_list(), batch_size=1))

display(df.select(df.texts, ner_udf(df.texts).alias('entities')))

But I receive the following error in the defition of the function ner_udf:

NotImplementedError: Invalid return type with scalar Pandas UDFs: ArrayType(StructType([StructField('word', StringType(), True), StructField('entity_group', StringType(), True), StructField('score', FloatType(), True), StructField('start', IntegerType(), True), StructField('end', IntegerType(), True)]), True) is not supported

I'm using a cluster with runtime 11.3 LTS with Apache Spark 3.3.0 and Scala 2.12


Dettagli del documento

Non modificare questa sezione. È necessaria per i collegamenti relativi ai problemi tra learn.microsoft.com ➟ GitHub.

Naveenommi-MSFT commented 1 year ago

@macc-n Thanks for your feedback! We will investigate and update as appropriate.

RamanathanChinnappan-MSFT commented 1 year ago

@macc-n

I've delegated this to @mssaperla, a content author, to review and share their valuable insights.

kateglee-db commented 1 year ago

Thanks for providing feedback that helps improve our documentation. We've created an internal work item (DOC-9222) to address your feedback. The timeline for resolution varies based on resourcing.

In the meantime, we recommend that you reach out to the Databricks Online User Community.

please-close