georgia-tech-db / evadb

Database system for AI-powered apps
https://evadb.ai/docs
Apache License 2.0
2.63k stars 262 forks source link

Text Summarization: 'int' object has no attribute 'isnumeric' #1355

Open onefanwu opened 12 months ago

onefanwu commented 12 months ago

Search before asking

Bug

evadb=#SELECT TextSummarizer(article) FROM cnn_news_test;
@status: ResponseStatus.FAIL
@batch: 
 None
@error: 'int' object has no attribute 'isnumeric'

When I run the queries in the text_summarization benchmark, I get the above error.

The queries used are as follows:

DROP TABLE IF EXISTS cnn_news_test;

CREATE TABLE IF NOT EXISTS cnn_news_test(
        id TEXT(128),
        article TEXT(4096),
        highlights TEXT(1024)
    );

DROP FUNCTION IF EXISTS TextSummarizer;

CREATE FUNCTION IF NOT EXISTS TextSummarizer
      TYPE HuggingFace
      TASK 'summarization'
      MODEL 'benchmark/models/distilbart-cnn-12-6'
      MIN_LENGTH 5
      MAX_LENGTH 100;

DROP TABLE IF EXISTS cnn_news_summary;

LOAD CSV 'benchmark/datasets/text/cnn_dailymail/test.csv'
INTO cnn_news_test;

CREATE TABLE IF NOT EXISTS cnn_news_summary AS
SELECT TextSummarizer(article) FROM cnn_news_test;

The error may be due to the following section in hf_abstract_function.py:

        for entry in function_obj.metadata:
            if entry.value.isnumeric():
                pipeline_args[entry.key] = int(entry.value)
            else:
                pipeline_args[entry.key] = entry.value

Environment

Are you willing to submit a PR?

xzdandy commented 12 months ago

Thanks @onefanwu for reporting this issue. I will fix this issue.

I think a workaround now is

CREATE FUNCTION IF NOT EXISTS TextSummarizer
      TYPE HuggingFace
      TASK 'summarization'
      MODEL 'benchmark/models/distilbart-cnn-12-6'
      MIN_LENGTH '5'
      MAX_LENGTH '100';