Closed sermolin closed 7 months ago
Hey, We plan to open source our Tokenizer in the near future. In the meantime, you can always use a call to one of our Foundation models with maxTokens = 0, and in the response you will get the prompt tokenized.
Shalom, Yuval. My goal is to make sure that the input does not exceed model's input content length (since we don't quite now "words" => "tokens" mapping). Could you, please, show how I would use the above workaround? I deployed ai21 model on SageMaker. This is what I do:
data = {
"prompt": "to be, or",
"numResults": 1,
"maxTokens": 0,
"temperature": 0
}
response_boto3 = boto3_client.invoke_endpoint(
EndpointName=endpoint_name,
Body=json.dumps(data),
ContentType="application/json",
Accept="application/json"
)
body = response_boto3["Body"].read()
body = json.loads(body.decode("utf-8"))
body
This is what I get:
{'id': 1234,
'prompt': {'text': 'to be, or',
'tokens': [{'generatedToken': {'token': '▁to▁be',
'logprob': -11.829436302185059,
'raw_logprob': -11.829436302185059},
'topTokens': None,
'textRange': {'start': 0, 'end': 5}},
{'generatedToken': {'token': ',',
'logprob': -4.737940311431885,
'raw_logprob': -4.737940311431885},
'topTokens': None,
'textRange': {'start': 5, 'end': 6}},
{'generatedToken': {'token': '▁or',
'logprob': -1.4099360704421997,
'raw_logprob': -1.4099360704421997},
'topTokens': None,
'textRange': {'start': 6, 'end': 9}}]},
'completions': [{'data': {'text': '', 'tokens': []},
'finishReason': {'reason': 'length', 'length': 0}}]}
Are you suggesting that I then do
input_token_count = len (body['prompt']['tokens'])
?
Related question: Is there a way for me to programmatically find out model's input content length of a deployed model (without reading the documentation)?
+1 for open sourcing a local tokenizer. This would speed up efficiency a lot for HELM evaluation if we can do tokenization locally rather than having to make API calls. We're already using local tokenization for most other API models. See https://github.com/stanford-crfm/helm/issues/1772
Shalom, Yuval. My goal is to make sure that the input does not exceed model's input content length (since we don't quite now "words" => "tokens" mapping). Could you, please, show how I would use the above workaround? I deployed ai21 model on SageMaker. This is what I do:
data = { "prompt": "to be, or", "numResults": 1, "maxTokens": 0, "temperature": 0 } response_boto3 = boto3_client.invoke_endpoint( EndpointName=endpoint_name, Body=json.dumps(data), ContentType="application/json", Accept="application/json" ) body = response_boto3["Body"].read() body = json.loads(body.decode("utf-8")) body
This is what I get:
{'id': 1234, 'prompt': {'text': 'to be, or', 'tokens': [{'generatedToken': {'token': '▁to▁be', 'logprob': -11.829436302185059, 'raw_logprob': -11.829436302185059}, 'topTokens': None, 'textRange': {'start': 0, 'end': 5}}, {'generatedToken': {'token': ',', 'logprob': -4.737940311431885, 'raw_logprob': -4.737940311431885}, 'topTokens': None, 'textRange': {'start': 5, 'end': 6}}, {'generatedToken': {'token': '▁or', 'logprob': -1.4099360704421997, 'raw_logprob': -1.4099360704421997}, 'topTokens': None, 'textRange': {'start': 6, 'end': 9}}]}, 'completions': [{'data': {'text': '', 'tokens': []}, 'finishReason': {'reason': 'length', 'length': 0}}]}
Are you suggesting that I then do
input_token_count = len (body['prompt']['tokens'])
?Related question: Is there a way for me to programmatically find out model's input content length of a deployed model (without reading the documentation)?
Yes - this is exactly what I meant :)
Regarding your last question - are you looking for a way to get the input restriction per model through an API? We currently don't have this kind of option (and if you meant something else - please ask again).
Also - I'm glad there's a demand for open sourcing our tokenizer :) I am pushing it here, hopefully it will be soon.
We are trying to avoid a “hard failure” from your models when they are invoked with a large input that exceeds context length. For that, CFRM-HELM benchmark first invokes your Tokenize API to compare input token length to the maximum acceptable by a model and truncates/modifies the input until it conforms to model’s requirement. My understanding is that right now your max input length is hard-coded and needs to be updated for each model. Yes, you are correct – I am asking to “get the input restriction per model through an API”
From: Yuval @.> Reply-To: AI21Labs/SageMaker @.> Date: Sunday, August 6, 2023 at 5:34 AM To: AI21Labs/SageMaker @.> Cc: "Ermolin, Sergey" @.>, Author @.***> Subject: Re: [AI21Labs/SageMaker] access to Tokenizer on SageMaker-deployed models (Issue #7)
Shalom, Yuval. My goal is to make sure that the input does not exceed model's input content length (since we don't quite now "words" => "tokens" mapping). Could you, please, show how I would use the above workaround? I deployed ai21 model on SageMaker. This is what I do:
data = {
"prompt": "to be, or",
"numResults": 1,
"maxTokens": 0,
"temperature": 0
}
response_boto3 = boto3_client.invoke_endpoint(
EndpointName=endpoint_name,
Body=json.dumps(data),
ContentType="application/json",
Accept="application/json"
)
body = response_boto3["Body"].read()
body = json.loads(body.decode("utf-8"))
body
This is what I get:
{'id': 1234,
'prompt': {'text': 'to be, or',
'tokens': [{'generatedToken': {'token': '▁to▁be',
'logprob': -11.829436302185059,
'raw_logprob': -11.829436302185059},
'topTokens': None,
'textRange': {'start': 0, 'end': 5}},
{'generatedToken': {'token': ',',
'logprob': -4.737940311431885,
'raw_logprob': -4.737940311431885},
'topTokens': None,
'textRange': {'start': 5, 'end': 6}},
{'generatedToken': {'token': '▁or',
'logprob': -1.4099360704421997,
'raw_logprob': -1.4099360704421997},
'topTokens': None,
'textRange': {'start': 6, 'end': 9}}]},
'completions': [{'data': {'text': '', 'tokens': []},
'finishReason': {'reason': 'length', 'length': 0}}]}
Are you suggesting that I then do input_token_count = len (body['prompt']['tokens'])?
Related question: Is there a way for me to programmatically find out model's input content length of a deployed model (without reading the documentation)?
Yes - this is exactly what I meant :)
Regarding your last question - are you looking for a way to get the input restriction per model through an API? We currently don't have this kind of option (and if you meant something else - please ask again).
Also - I'm glad there's a demand for open sourcing our tokenizer :) I am pushing it here, hopefully it will be soon.
— Reply to this email directly, view it on GitHubhttps://github.com/AI21Labs/SageMaker/issues/7#issuecomment-1666844195, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF7BVAL3JJBWI4TXKCBER63XT6FJNANCNFSM6AAAAAA3BYU57A. You are receiving this because you authored the thread.Message ID: @.***>
Hi, I see this issue has been open for a while. We have open sourced our tokenizer in the following project: https://github.com/AI21Labs/ai21-tokenizer
And it's also available now in our ai21 SDK v2: https://github.com/AI21Labs/ai21-python
Please let us know if it solves your issue
This resolves the issue. Thank you!
Hello. https://github.com/stanford-crfm/helm uses AI21 Tokenizer API call (https://docs.ai21.com/reference/tokenize-ref) to validate that the input would not exceed model's input context length. Is it possible to access the internal Tokenize function on a model that's already deployed on a SageMaker endpoint? Looking for an API call structure like
J2 Mid
response_mid = ai21.Tokenize.execute( destination=ai21.SageMakerDestination(""),
prompt="explain black holes to 8th graders",
)