Blazebit / blaze-persistence

Rich Criteria API for JPA providers
https://persistence.blazebit.com
Apache License 2.0
744 stars 90 forks source link

Having __typename in a GQL query causes the query planer to fetch the PK only #1707

Closed EugenMayer closed 1 year ago

EugenMayer commented 1 year ago

Due to the changes on GraphQLEntityViewSupport#applyFetches specifically on case when meta fields like __typename are detected in the field selection

if (META_FIELDS.contains(fieldName)) {
    // If this is a GraphQL meta field (e.g. __typename) then there's no EV mapping for it but still
    // we must fetch the data from DB to populate it. If the enclosing EV is a standard view then use
    // its id field. Otherwise, skip this fieldPart and effectively put the intermediate field path
    // into fetches - in reality this wouldn't lead to any unnecessary joins since the EV in that case
    // corresponds to an embeddable entity.
    for (String metaFieldTypeName: field.getObjectTypeNames()) {
        // Meta field might have more parent types (in case of a union type). Try to find a first
        // ViewType among them and use its id field to ensure the type info can be obtained from DB.
        ManagedViewType<?> managedViewType = typeNameToViewType.get(metaFieldTypeName);
        if (managedViewType instanceof ViewType) {
            mappedFields.add(((ViewType<?>) managedViewType).getIdAttribute().getName());
            break;
        }
    }
    continue;
}

We see a regression (all query fields null in the results), that for example this query

query GetLdapConfigurations {
  ldapServerConfigurations(first: 100, offset: 0) {
    edges {
      node {
        id
        hostname
        enabled
        groupConfiguration {
          id
          __typename
        }
        __typename
      }
    }
  }
}

fails, since the created EntityViewSetting include a fetcher for id which we assume causes the query planer to fetch the identity of the view (only).

When we compare the old implementation, the same query has no fetchers (as to be expected), with the new one, we have the id fetcher. If we remove __typename, we see the old behavior, no fetchers and the query works alright.

beikov commented 1 year ago

I don't quite understand the problem. Both queries {__typename } and {id __typename} will only fetch the id of GroupConfiguration, so what is the problem you are having?.

EugenMayer commented 1 year ago

@beikov the problem is as following

query GetLdapConfigurations {
  ldapServerConfigurations(first: 100, offset: 0) {
    edges {
      node {
        id
        hostname
        enabled
        groupConfiguration {
          id
          __typename
        }
        __typename
      }
    }
  }
}

This query is supposed to fetch id, hostname, enabled, groupConfiguration.id ... and more. Using the version from 2 Jan (including the new DGS mutator things) this will work without issues.

The current master, du to the changes made in GraphQLEntityViewSupport#applyFetches in Jan 5+ will only return the top level id. This only happens if the query includes the __typename.

The reason is

if (managedViewType instanceof ViewType) {
            mappedFields.add(((ViewType<?>) managedViewType).getIdAttribute().getName());
            break;
 }

That a fetcher (for id) is added, which seem to override the entire field selection. When debugging the settings with the version from Jan 4 no fetcher is added - as none is needed.

@david-kubecka might be more aware of the changes. Since i do not know the intention and the method is a mutli-level for loop with several conditions .. not sure anybody knows its intention :)

david-kubecka commented 1 year ago

@EugenMayer Can you please state the exact expectations vs. reality about the output, e.g. in the JSON format?

EugenMayer commented 1 year ago

Iam not sure what to explain here. No matter what fields you request, simply no matter, only the ID is returned if __typename is set.

david-kubecka commented 1 year ago

@EugenMayer I can't reproduce your issue. In fact, there already is a query very similar to yours among the test cases: https://github.com/Blazebit/blaze-persistence/blob/fdd739496f03b39a6e2620dd537edfb72dbf2b1c/examples/spring-data-spqr/src/test/java/com/blazebit/persistence/examples/spring/data/spqr/repository/SampleTest.java#L114-L139. As you can see it uses the combination of __typename, id and other fields and the related test case testRequestScope runs just fine.

Can you please experiment with this query and help me find the offending setup? The schema in this case is auto-generated from EVs which you can of course enhance/modify for test purposes. Once you have the reproducer I'm willing to work on the fix.

EugenMayer commented 1 year ago

@david-kubecka my problem currently is, i see about 20 sequential commits on the master referencing https://github.com/Blazebit/blaze-persistence/issues/1653 as a root and i'am entirely unaware what has been the motivation or technical change behind it.

Providing a reproducer is not as easy as this if i do not understand the corner-stones of this change. Currently my state is 'it worked before that commit and has been broken with the other'.

My view specs are

@EntityView(LdapServerConfiguration.class)
public interface LdapServerConfigurationGraphQlView extends LdapServerConfigurationGraphQlExcerptView
{
  LdapGroupConfigurationGraphQlExcerptView getGroupConfiguration();
}
@EntityView(LdapServerConfiguration.class)
public interface LdapServerConfigurationGraphQlExcerptView extends LdapServerConfigurationIdView
{
  @NotNull
  Boolean getEnabled();

  @NotNull
  String getReadonlyUser();

  @NotNull
  String getReadonlyPassword();

  @NotNull
  LdapServerType getType();

  @NotNull
  String getHostname();

  @NotNull
  Integer getPort();

  @NotNull
  LdapServerEncryption getEncryption();

  String getCertificate();

  @NotNull
  @MappingSingular
  Set<String> getBaseDNs();

  String getCustomerProvidedAccountFilter();

  @NotNull
  String getAccountIdAttr();

  @NotNull
  String getAccountEmailAttr();
}

with

@EntityView(LdapGroupConfiguration.class)
public interface LdapGroupConfigurationGraphQlExcerptView extends LdapGroupConfigurationIdView
{
  @NotNull
  Boolean getSyncEnabled();

  @MappingSingular
  Set<String> getBaseDNs();

  @NotNull
  String getIdAttr();

  @NotNull
  String getNameAttr();

  @NotNull
  String getMemberAttr();

  String getObjectClass();

  String getCustomerProvidedFilter();
}

I intentionally left the class inheritance information in here if this is a cause of whatever.

I can see 2 specialities in this view

  @NotNull
  @MappingSingular
  Set<String> getBaseDNs();

and an nested view

  LdapGroupConfigurationGraphQlExcerptView getGroupConfiguration();

beside that we have 'primitives' and enums, nothing more. Is there any way you can see a related aspect with your changes so we can create a rough reproduced of the case?

a) are there any tests with graphql views, which include nested graphql views? (and __typename)?

david-kubecka commented 1 year ago

@EugenMayer I don't want you to debug the issue (by looking at the commit history or any other method). I just need a reproducer. As I said there's already almost the same case (with nesting, __typename, paging) in the tests (see link above). This test passes. As you know the structure and specifics of your data I ask you to try to match those specifics to that test. Typically this involves appending existing or creating new classes/entity views. If you have a suspicion that some particular annotation might be a culprit then go add it there. I know this is not easy but that's exactly what I was doing when reporting or fixing a reported issue 🤷

EugenMayer commented 1 year ago

@david-kubecka i cannot exactly say that this issue took any effort to reproduce: https://github.com/Blazebit/blaze-persistence/pull/1717/files#diff-5403ad5b75e6642c52412fd81c23413d1c703f37b57944b98b617d619fc730acR100

Using the most basic test this just blows up right in my face.

david-kubecka commented 1 year ago

Using the most basic test

Interesting. Apparently, the issue manifests only in a simple scenario. In the more complex one (with nesting etc.) as in the existing test case there are no problems. Will have a look at that.

EugenMayer commented 1 year ago

@david-kubecka great to hear. This is blocking the entire main branch for us, which is also critical to work on the spring boot 3.0 integration. If we could get this out of the way, would help greatly. thanks!

david-kubecka commented 1 year ago

Update: Upon closer look, I realized a few things:

@beikov @EugenMayer Do you have any opinions about this?

david-kubecka commented 1 year ago

Also, the issue isn't actually related to __typename at all. Its presence in the selection set just triggers a bug that was in the DGS implementation forever. Namely, that no fields were actually registered to fetches in GraphQLEntityViewSupport due to the above-mentioned mismatch. The BP query engine deals with empty fetches as if the user requested everything, therefore the query previously seemed to work although it was suboptimal (always fetching the whole model from DB instead of just the selection set). The __typename change mentioned in the original description just highlighted this bug by always putting something (the id) to the fetches, therefore no fields other than id and __typename are present in the result.

I've tried to modifying addObjectTypeDefinition and create methods in GraphQLEntityViewSupportFactory which deal with Relay types generation but the code seems to me quite convoluted so none of the combinations I tried worked :-( The only possible (and fair) solution seems to be to revert DGS to the previous faulty state and explicitly acknowledge (via a flag) that the fetches optimization doesn't work for it and completely disable it.

@EugenMayer Let me know if this works for you or you want to dig deeper and try to fix the optimization.

@beikov I think that the root cause of all this trouble is the silent ignoring of errors in https://github.com/Blazebit/blaze-persistence/blob/fdd739496f03b39a6e2620dd537edfb72dbf2b1c/integration/graphql/src/main/java/com/blazebit/persistence/integration/graphql/GraphQLEntityViewSupport.java#L670-L676. Is there any valid reason why the mapping would not be initialized for a type? If not then IMO even throwing a NPE would be better than the current state because errors would be caught early and not during some unrelated refactors.

EugenMayer commented 1 year ago

@david-kubecka interesting - we actually found the same during debugging (empty fetches) and where "wondering", but comparing it with the pre-jan implementation we though that "no fetches" is no deal, it is only for "sub views" (we thought). It is interesting that this turns out to be a regression of a bug that was there all the time.

@beikov i would really love to rather fix the fetches then "fetch it all" - this really can become a huge deal for us fetching half of the database. i also have hard times digging through the code so i'am not sure i can be of any help here.

beikov commented 1 year ago

Sounds to me that typeNameToViewType and typeNameToFieldMapping do not contain entries for the XEdge types, which would be a bug that should be easy to solve.

david-kubecka commented 1 year ago

Yeah, specifically typeNameToFieldMapping doesn't contain the CatWithOwnerViewNode. I've tried adding it there and got some error indicating that more changes need to be made, potentially going down the rabbit hole which I wasn't ready to do myself. Perhaps @EugenMayer is brave enough :-) At least there are tests that tell you once you succeed.

Still, I'm wondering why there's this discrepancy between different GrapqhQL tools regarding the generated Relay types (CatWithOwnerViewNode vs CatWithOwnerView). Is there any strong reason for that?

david-kubecka commented 1 year ago

@beikov Meanwhile, I'm thinking of introducing a public flag which will disable the silent null fieldMapping swallowing I talked about above. I would like to enable the stricter check at least for my project so that it fails explicitly when the "fetches" optimization doesn't work. Otherwise these types of bugs are hard to spot (you would need to look at the DB traffic).

david-kubecka commented 1 year ago

PR here: https://github.com/Blazebit/blaze-persistence/pull/1720

EugenMayer commented 1 year ago

Still, I'm wondering why there's this discrepancy between different GrapqhQL tools regarding the generated Relay types (CatWithOwnerViewNode vs CatWithOwnerView). Is there any strong reason for that?

Thank you for digging @david-kubecka ! i think that is discussed here right now #1719 if you refer to the fact that nodes in a connect.edge are not of the type CatWithOwnerView but rather of the type CatWithOwnerViewNode

In fact i just implemented in #1717 what we discussed with @beikov in #1719 in this branch and yes, the tests to pass now.

@beikov since we are planning to have the the behavior, where a connection.edge.node type is no longer ViewNode, but View, to be the new default and the "old way" is problematic, i`am not sure we should keep the old ViewNode as an option. Of course, as we discussed, i added an option and documented it in the CHANGELOG, but i'am not sure it is worth fixing it if we have no good reason why it was implemented in the first place (especially considering the issues we found here)

david-kubecka commented 1 year ago

if you refer to the fact that nodes in a connect.edge are not of the type CatWithOwnerView

That's exactly what I meant. I'm glad that the issue is being resolved the correct way, i.e. the root cause is fixed and not some hacky workaround for the consequence introduced.

david-kubecka commented 1 year ago

Can you also please try my PR #1720 on your branch and set the strict check on in all examples? With our two fixes combined I think all tests should pass.