Update Optional field to be not required

hemslo commented 1 year ago

If a field is Optional, generated code shouldn't mark it as required like Field(..., alias="some_alias"), doc.

This change will allow dynamic graphql response with directives to pass validation.

chassing commented 1 year ago

Please don't merge this PR. It's on purpose to have optional fields as required. The main purpose of the GQL classes is the validation of the schemas in our code. We want to know if a schemas change breaks our code. So we have to specify all attributes, including optionals, in case of breaking changes, mypy will notice it.

For our tests, we've introduced gql_class_factory and data_factory pytests fixtures for easier initialization of GQL classes and fixtures.

hemslo commented 1 year ago

@chassing This is blocking https://github.com/app-sre/qontract-reconcile/pull/3470, we need optional field to be not required to pass validation, otherwise typed query will fail to dynamic response (with @include directive).

chassing commented 1 year ago

I don't know the @include directive. How does it work?

hemslo commented 1 year ago

It works similar to our template get_aws_account, but in server side, we always pass the same gql body, but with different query variables, graphql server will generate different response. Official doc https://www.apollographql.com/docs/apollo-server/schema/directives/#default-directives

chassing commented 1 year ago

For this change, you need approval from @fishi0x01; as I said, he implemented it on purpose. I could imagine a special # qenerate: flag to change this behavior.

hemslo commented 1 year ago

This is not a blocker anymore because @include is removed from the other pr, but it still worth checking why we made optional field required. That means we can't use @include or @skip directives.

fishi0x01 commented 1 year ago

This is a very interesting topic indeed. Here are my main thoughts around this.

1. Type ambiguity with Unions

The main reason we enforce Optionals to be set is to achieve a strict data <-> schema mapping in order to be sure to convert the nested untyped dicts to the proper data types.

@include basically adds another layer of type ambiguity. This can be problematic when it comes to Union attributes.

Inline fragments are interpreted as Unions. Pydantic deduces the type by trying to feed the dict to each type in the Union. The first that works wins. That means that you can construct potentially ambiguous queries with @include.

Lets take the following as an example:

query C($flag: Boolean!) {
  data {
    x
    ... on B {
      y
    }
    ... on D {
      y
      z @include(if: $flag) {
        sth
      }
    }
    ... on E {
      somethingelse
    }
  }
}

If we say that z in D is not required to be set, then pydantic might falsely map all returned data as a B. My understanding is that this is potentially dangerous if the business logic uses instanceof checks (something we often rely on in our code-base).

However, one might argue that a query like the above doesn't really make much sense, is very constructed and unlikely to happen. Still, this made me worry and Im not sure if there are more cases where that might be a potential issue - so back then the decision was done to strictly match the data structure with the data to be on the safe side.

2. @include with NonNull fields

If we decide to make Optionals not required, then my understanding is that there is still a problem with NonNull fields. qenerate strictly deduces types based on the gql schema / introspection.json. I.e., if @include is applied on a NonNull field, then the generated field won't be optional. E.g., lets take this query, where instance is a NonNull field in our schema:

query JenkinsConfigs($flag: Boolean!) {
  jenkins_configs: jenkins_configs_v1 {
    name
    ...on JenkinsConfig_v1 {
      instance @include(if: $flag) {
        name
      }
      type
    }
  }
}

This will yield:

class JenkinsConfigV1_JenkinsConfigV1(JenkinsConfigV1):
    instance: JenkinsInstanceV1 = Field(..., alias="instance")
    q_type: str = Field(..., alias="type")

I.e., even if we make Optionals not required, then any NonNull gql field will still fail (in this case the instance field). To fix this, qenerate needs to be able to properly detect @include in the query tree and make forcefully make everything inside an Optional. I guess this is possible to do while running the visitor on the query.

3. Why do we need @include?

My understanding is that @include merely exists as syntactic sugar, i.e., it reduces boilerplate on client-side. You can lump together multiple queries into a single one in order to reduce boilerplate.

Personally, I think it would be better to have queries dedicated for a specific set of data in order to reduce ambiguity (similar to the notion of having a function for single purpose: having a query for a single data set). I.e., it is fine to have 2 queries if you want to have 2 different data sets. If you want to reduce boilerplate, then I'd use fragments instead. Fragments cannot reduce boilerplate as good as @include, but they do not introduce type ambiguity and convert potentially required fields to optional fields. However, in the end this is a taste question and should not be enforced by qenerate - so this is not a valid concern to not support @include.

Conclusion

All that being said, qenerate should not enforce any style and ideally fully comply with the gql standard. In order to do so, we somehow must overcome the type-ambiguity issue. I do not have a good idea here right now. Simplest mitigation might be adding an extra feature flag as Chris pointed out. I.e., if you want to consciously use use @include, then you can opt-in via feature flag. That will give the behavior of not requiring Optionals to be set in the data at the cost of extra type-ambiguity. As long as instanceof is avoided in business logic or you do not have any affected inline fragments, then this wont be an issue. Still I would suggest to not have this as default behavior.

Wdyt?

hemslo commented 1 year ago

By following GraphQL spec, if a field has no value, but it's defined in query, it should be kept in response (no directive case). So always make optional field required is OK, as optional is derived from GraphQL spec, not really a normal optional as we usually use. Some JSON serializer may have options to ignore empty values, but to comply with GraphQL spec, it shouldn't be used.

Directives may makes things dynamic, so it's hard to define a complete static typing in this case, may worth explore how it's done in other languages, how to they support this feature?

app-sre / qenerate