anthropics / anthropic-sdk-python

MIT License
1.21k stars 150 forks source link

Feature-request: Pydantic validators+serializers to be able to round-trip all supported types #558

Open charles-dyfis-net opened 1 month ago

charles-dyfis-net commented 1 month ago

Right now, pydantic can't instantiate a TypeAdapter for anthropic.types.MessageParam on account of the support for file-like objects (which, by nature, can't be serialized to JSON) in the data types used for image support. Attempting to instantiate a pydantic.TypeAdapter(anthropic.types.MessageParam) will throw a PydanticSchemaGenerationError, because typing.IO[bytes] can't be represented as a pydantic_core schema.

If instead of using TypedDict the various classes were implemented as dataclasses (ideally using pydantic.dataclasses.dataclass), or were implemented using subclasses of pydantic.BaseModel, these classes could define custom serializers to convert into JSON-representable data -- for example, serializing a file-like object by actually reading its content into memory.

rattrayalex commented 1 month ago

Attempting to instantiate a pydantic.TypeAdapter(anthropic.types.MessageParam)

Can you share more about your use-case? What are you trying to do?

cc @RobertCraigie

charles-dyfis-net commented 1 month ago

Attempting to instantiate a pydantic.TypeAdapter(anthropic.types.MessageParam)

Can you share more about your use-case? What are you trying to do?

Sure -- I'm trying to serialize pending LLM requests to JSON to put them in a work queue, with a consumer for each backend able to execute them (one per Bedrock region with the appropriate model, one for Anthropic first-party, &c) and then deserialize and run those requests.

lingster commented 3 weeks ago

Rather than serialising the entire object, if it's a file could you not store in an s3 or R2 bucket and serialize the url and just add that to your queue.

charles-dyfis-net commented 3 weeks ago

Rather than serialising the entire object, if it's a file could you not store in an s3 or R2 bucket and serialize the url and just add that to your queue.

I don't actually need to serialize file-like objects.

Thing is, that doesn't matter: Because file-like objects are possible in a MessageParam, I can't instantiate a pydantic.TypeAdapter for MessageParam instances; Pydantic wants to be able to build a JSONSchema description of the type, so as long as there's something in the union that can't be represented in JSONSchema, the TypeAdapter instantiation fails during introspection before ever looking at the individual instance and what values are or aren't present.

That's the point of adding a serializer that replaces those objects with their content: the act of doing so will make messages serializable in practice even if they don't use the option to have a file handle attached, and it'll do so losslessly (in a way that lets folks use the Anthropic API and Pydantic together in a way that's natural to each and adds no extra configuration or dependencies); perhaps a bit inefficient compared to S3 or R2, but someone who cares about that inefficiency and is willing to add new service dependencies can add their own code to store content out-of-band as they see fit.

rattrayalex commented 3 weeks ago

@charles-dyfis-net can you share a full example of the code you'd like to be able to write, and what you have to do today?

rattrayalex commented 3 weeks ago

Have you looked at our .to_json() helpers? Do they help at all?

charles-dyfis-net commented 3 weeks ago

Have you looked at our .to_json() helpers? Do they help at all?

I haven't; if there exist corresponding from_json() helpers to be able to round-trip back to an object, that would be exactly what I need.

rattrayalex commented 3 weeks ago

mmm, I think something like Message.from_json('{"foo": …}') could make sense!

FWIW, I'd expect this to internally look roughy like this:

data = json.loads(…)
return Message.build(**data)

care to give that a try and see how it goes for you?

charles-dyfis-net commented 3 weeks ago

Thank you -- I'll do that, hopefully within the next few days. (I'd still prefer to see Pydantic's (de)serialization work out-of-the-box, so folks don't need to implement logic specific to the Anthropic SDK, but if this does in fact work as advertised that reduces the priority / pain level significantly).

rattrayalex commented 2 weeks ago

Great, let me know what you find!

RobertCraigie commented 2 weeks ago

Hi @charles-dyfis-net, in the next release you'll be able to use MessageParam with TypeAdapters :)

Image params won't serialise properly yet as we haven't defined a custom serialiser to handle file inputs, will have more to share on that front soon.