danielgtaylor / python-betterproto

Clean, modern, Python 3.6+ code generator & library for Protobuf 3 and async gRPC
MIT License
1.5k stars 211 forks source link

Generate additional Type Hints for the Dictionary Form of Messages & Enums #584

Open Atokulus opened 2 months ago

Atokulus commented 2 months ago

Summary

I frequently use the dictionary form of betterproto messages (from_dict(), to_dict()) with stringified enums as arguments for web endpoints and database ODMs. Having an exact type hint for the dictionary form would allow for automatic data validation and more.

What is the feature request for?

The core library

The Problem

I frequently use the dictionary form of betterproto messages (from_dict(), to_dict()) as arguments for web endpoints (Quart, Quart Schema) and database ODMs (such as Beanie for MongoDB).

These frameworks often use pydantic for their validation mechanism, as well as documentation generation (such as Swagger). Having an exact type hint for the dictionary form would allow for automatic data validation and more.

Using the original Message type hints directly (i.e. using the @dataclass annotated message classes) is an option. Yet in some of my cases I need to use the stringified form of the protobuf enums, rather than the integer based one. Generating a dynamic type hint in runtime is cumbersome and not a solution.

The Ideal Solution

It would be great to have the dictionary types of messages to be generated side by side with the current dataclasses: They dictionary types could be later used inside validation frameworks.

See the following example:

enum DocumentType {
  UNKNOWN = 0;
  CONTRACT = 1;
  INVOICE = 2;
}

message Document {
  DocumentType type = 1;
  string content = 2;
}

This would generate to


# Currently generated
class DocumentType(betterproto.Enum):
    UNKNOWN = 0
    CONTRACT = 1
    INVOICE = 2

@dataclass(eq=False, repr=False)
class Document(betterproto.Message):
    type: "DocumentType" = betterproto.enum_field(1)
    content: str = betterproto.string_field(2, group="_data")

# Additionally generated
class DocumentTypeStringified(str, Enum):
    UNKNOWN = "UNKNOWN"
    CONTRACT = "CONTRACT"
    INVOICE = "INVOICE"

class DocumentDict(TypedDict):
    type: DocumentTypeStringified
    content: str

Now we could use this inside Quart-Schema for parameter validation and documentation:

app = Quart(__name__)
QuartSchema(app)

@app.post(
    "/documents/<document_name>",
)
@validate_request(
    DocumentDict,
)
async def create_document(document_name: str, data: DocumentDict):
    document = Document().from_dict(data)
    await db.save_document(document)
    return jsonify(success=True), 200

Of course, there might be some corner-cases, e.g. what to do in case of bytes? Should this be done by manually extending the DocumentDict class and override it? How should dataclass names be extended without interfering with other messages? Maybe put them into their own python file?

All in all it seems to be a most valuable and maybe even an easy to develop solution for many good use cases.

What do you think?

Best regards Markus

The Current Solution

Generating a dynamic type hint from proto enums in runtime is cumbersome and not very maintainable.

DocumentTypeEnumStringified= Enum("DocumentTypeEnumStringified", [item.name for item in DocumentTypeEnum])

class MyDbDocument(Document):
    type: Annotated[str, DocumentTypeEnumStringified]
Atokulus commented 2 months ago

P.S. I'd gladly support development of this extension.