AI21Labs / SageMaker

Examples for using AI21's models through Amazon SageMaker
31 stars 15 forks source link

access to Tokenizer on SageMaker-deployed models #7

Closed sermolin closed 7 months ago

sermolin commented 1 year ago

Hello. https://github.com/stanford-crfm/helm uses AI21 Tokenizer API call (https://docs.ai21.com/reference/tokenize-ref) to validate that the input would not exceed model's input context length. Is it possible to access the internal Tokenize function on a model that's already deployed on a SageMaker endpoint? Looking for an API call structure like

J2 Mid

response_mid = ai21.Tokenize.execute( destination=ai21.SageMakerDestination(""), prompt="explain black holes to 8th graders", )

yuvalbelfer commented 1 year ago

Hey, We plan to open source our Tokenizer in the near future. In the meantime, you can always use a call to one of our Foundation models with maxTokens = 0, and in the response you will get the prompt tokenized.

sermolin commented 1 year ago

Shalom, Yuval. My goal is to make sure that the input does not exceed model's input content length (since we don't quite now "words" => "tokens" mapping). Could you, please, show how I would use the above workaround? I deployed ai21 model on SageMaker. This is what I do:

    data = {
        "prompt": "to be, or",
        "numResults": 1,
        "maxTokens": 0,
        "temperature": 0
    }
response_boto3 = boto3_client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(data),
    ContentType="application/json",
    Accept="application/json"
)
body = response_boto3["Body"].read()
body = json.loads(body.decode("utf-8"))
body    

This is what I get:

{'id': 1234,
 'prompt': {'text': 'to be, or',
  'tokens': [{'generatedToken': {'token': '▁to▁be',
     'logprob': -11.829436302185059,
     'raw_logprob': -11.829436302185059},
    'topTokens': None,
    'textRange': {'start': 0, 'end': 5}},
   {'generatedToken': {'token': ',',
     'logprob': -4.737940311431885,
     'raw_logprob': -4.737940311431885},
    'topTokens': None,
    'textRange': {'start': 5, 'end': 6}},
   {'generatedToken': {'token': '▁or',
     'logprob': -1.4099360704421997,
     'raw_logprob': -1.4099360704421997},
    'topTokens': None,
    'textRange': {'start': 6, 'end': 9}}]},
 'completions': [{'data': {'text': '', 'tokens': []},
   'finishReason': {'reason': 'length', 'length': 0}}]}

Are you suggesting that I then do input_token_count = len (body['prompt']['tokens'])?

Related question: Is there a way for me to programmatically find out model's input content length of a deployed model (without reading the documentation)?

yifanmai commented 1 year ago

+1 for open sourcing a local tokenizer. This would speed up efficiency a lot for HELM evaluation if we can do tokenization locally rather than having to make API calls. We're already using local tokenization for most other API models. See https://github.com/stanford-crfm/helm/issues/1772

yuvalbelfer commented 1 year ago

Shalom, Yuval. My goal is to make sure that the input does not exceed model's input content length (since we don't quite now "words" => "tokens" mapping). Could you, please, show how I would use the above workaround? I deployed ai21 model on SageMaker. This is what I do:

    data = {
        "prompt": "to be, or",
        "numResults": 1,
        "maxTokens": 0,
        "temperature": 0
    }
response_boto3 = boto3_client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(data),
    ContentType="application/json",
    Accept="application/json"
)
body = response_boto3["Body"].read()
body = json.loads(body.decode("utf-8"))
body    

This is what I get:

{'id': 1234,
 'prompt': {'text': 'to be, or',
  'tokens': [{'generatedToken': {'token': '▁to▁be',
     'logprob': -11.829436302185059,
     'raw_logprob': -11.829436302185059},
    'topTokens': None,
    'textRange': {'start': 0, 'end': 5}},
   {'generatedToken': {'token': ',',
     'logprob': -4.737940311431885,
     'raw_logprob': -4.737940311431885},
    'topTokens': None,
    'textRange': {'start': 5, 'end': 6}},
   {'generatedToken': {'token': '▁or',
     'logprob': -1.4099360704421997,
     'raw_logprob': -1.4099360704421997},
    'topTokens': None,
    'textRange': {'start': 6, 'end': 9}}]},
 'completions': [{'data': {'text': '', 'tokens': []},
   'finishReason': {'reason': 'length', 'length': 0}}]}

Are you suggesting that I then do input_token_count = len (body['prompt']['tokens'])?

Related question: Is there a way for me to programmatically find out model's input content length of a deployed model (without reading the documentation)?

Yes - this is exactly what I meant :)

Regarding your last question - are you looking for a way to get the input restriction per model through an API? We currently don't have this kind of option (and if you meant something else - please ask again).

Also - I'm glad there's a demand for open sourcing our tokenizer :) I am pushing it here, hopefully it will be soon.

sermolin commented 1 year ago

We are trying to avoid a “hard failure” from your models when they are invoked with a large input that exceeds context length. For that, CFRM-HELM benchmark first invokes your Tokenize API to compare input token length to the maximum acceptable by a model and truncates/modifies the input until it conforms to model’s requirement. My understanding is that right now your max input length is hard-coded and needs to be updated for each model. Yes, you are correct – I am asking to “get the input restriction per model through an API”

From: Yuval @.> Reply-To: AI21Labs/SageMaker @.> Date: Sunday, August 6, 2023 at 5:34 AM To: AI21Labs/SageMaker @.> Cc: "Ermolin, Sergey" @.>, Author @.***> Subject: Re: [AI21Labs/SageMaker] access to Tokenizer on SageMaker-deployed models (Issue #7)

Shalom, Yuval. My goal is to make sure that the input does not exceed model's input content length (since we don't quite now "words" => "tokens" mapping). Could you, please, show how I would use the above workaround? I deployed ai21 model on SageMaker. This is what I do:

data = {

    "prompt": "to be, or",

    "numResults": 1,

    "maxTokens": 0,

    "temperature": 0

}

response_boto3 = boto3_client.invoke_endpoint(

EndpointName=endpoint_name,

Body=json.dumps(data),

ContentType="application/json",

Accept="application/json"

)

body = response_boto3["Body"].read()

body = json.loads(body.decode("utf-8"))

body

This is what I get:

{'id': 1234,

'prompt': {'text': 'to be, or',

'tokens': [{'generatedToken': {'token': '▁to▁be',

 'logprob': -11.829436302185059,

 'raw_logprob': -11.829436302185059},

'topTokens': None,

'textRange': {'start': 0, 'end': 5}},

{'generatedToken': {'token': ',',

 'logprob': -4.737940311431885,

 'raw_logprob': -4.737940311431885},

'topTokens': None,

'textRange': {'start': 5, 'end': 6}},

{'generatedToken': {'token': '▁or',

 'logprob': -1.4099360704421997,

 'raw_logprob': -1.4099360704421997},

'topTokens': None,

'textRange': {'start': 6, 'end': 9}}]},

'completions': [{'data': {'text': '', 'tokens': []},

'finishReason': {'reason': 'length', 'length': 0}}]}

Are you suggesting that I then do input_token_count = len (body['prompt']['tokens'])?

Related question: Is there a way for me to programmatically find out model's input content length of a deployed model (without reading the documentation)?

Yes - this is exactly what I meant :)

Regarding your last question - are you looking for a way to get the input restriction per model through an API? We currently don't have this kind of option (and if you meant something else - please ask again).

Also - I'm glad there's a demand for open sourcing our tokenizer :) I am pushing it here, hopefully it will be soon.

— Reply to this email directly, view it on GitHubhttps://github.com/AI21Labs/SageMaker/issues/7#issuecomment-1666844195, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF7BVAL3JJBWI4TXKCBER63XT6FJNANCNFSM6AAAAAA3BYU57A. You are receiving this because you authored the thread.Message ID: @.***>

etang-ai21 commented 7 months ago

Hi, I see this issue has been open for a while. We have open sourced our tokenizer in the following project: https://github.com/AI21Labs/ai21-tokenizer

And it's also available now in our ai21 SDK v2: https://github.com/AI21Labs/ai21-python

Please let us know if it solves your issue

yifanmai commented 7 months ago

This resolves the issue. Thank you!