Netflix / dgs-codegen

Apache License 2.0
177 stars 92 forks source link

Add Boolean fields to Data classes for supporting sparse update #664

Closed krutikavk closed 2 months ago

krutikavk commented 3 months ago

Description

Issue: #609

Add Boolean fields to all data classes. Here is a gist of changes:

  1. Generate boolean fields for Java data classes by default, removed additional flag exposed for CodeGen config
  2. Add a boolean for each field in data class called is<Field>Defined
  3. Getter for boolean called getIs<Field>Defined
  4. Setter for each field in the class explicitly sets value of is<Field>Defined
  5. Update boolean field and getter similarly for Builder class
  6. Update Builder.build() function to use setter functions for each field from data class
  7. Please note since this feature is enabled as default, most tests needed an update to account for additional fields added to data classes.

Example of data classes created:

  1. Schema
  2. Generated data types with this change for schema here

Thanks!

Validation

Sample schema and codegen with bitset: https://github.com/krutikavk/dgs-codegen-run

krutikavk commented 3 months ago

Hi @srinivasankavitha Can you take a look at the PR

srinivasankavitha commented 3 months ago

Hi @srinivasankavitha Can you take a look at the PR Thanks for the PR! Will do later this week.

krutikavk commented 3 months ago

Thanks for the PR! Will do later this week.

Thanks @srinivasankavitha. I am also working on updating Kotlin codegen. What is the difference between generators/kotlin and generators/kotlin2--should I be updating both? The flag that exposes Kotlin2 data type generation generateKotlinNullableClasses is not exposed as a flag for codegen CLI

srinivasankavitha commented 3 months ago

yeah that flag is specific to kotlin2. Yes, it would be great of you can add to both since we try to maintain feature parity as much as possible.

krutikavk commented 3 months ago

yeah that flag is specific to kotlin2. Yes, it would be great of you can add to both since we try to maintain feature parity as much as possible.

Sounds great, thank you 👍

srinivasankavitha commented 3 months ago

Also, you can run ./gradlew formatKotlin to fix linting errors that are failing the build.

krutikavk commented 3 months ago

Linting and some unit tests fixed, please check @srinivasankavitha

krutikavk commented 3 months ago

@srinivasankavitha I see linting fixed many files across the repo--can you please check if this change is as required by the build? Since the last failure 7a1d567, added one more linter fix that had slipped by.

srinivasankavitha commented 3 months ago

@krutikavk - I'm reviewing your PR and have a few questions: 1) Can you provide an example of how this newly generated class will be used? Does a user call isFieldSet() explicitly? It is marked private so I'm unclear how as a user I will use this feature. 2) Did you make any changes to the tests under integTests? If not, could you remove those files, since they seem to be all formatting updates and is adding a lot of noise to the PR

krutikavk commented 3 months ago

@krutikavk - I'm reviewing your PR and have a few questions:

  1. Can you provide an example of how this newly generated class will be used? Does a user call isFieldSet() explicitly? It is marked private so I'm unclear how as a user I will use this feature.
  2. Did you make any changes to the tests under integTests? If not, could you remove those files, since they seem to be all formatting updates and is adding a lot of noise to the PR
  1. Issue #609 : The features ensures proper handling of optional fields in GraphQL Mutations. With additional fields, codegen deserialization can properly mark the fields that are not provided in a mutation so that the DGS/subgraph implementation can handle sparse updates. The feature ensures that optional fields are not overwritten with default values or null, and hence provides a more effective and desirable way of handling mutations in GraphQL.

Since object mappers in DGS framework rely on reflection, methods isSet() and setField() can stay private.

We will follow up with a more comprehensive solution working with DGS runtime as well. As a follow-up, mappers for Java and Kotlin objects will be updated to use Bitset-related fields to handle sparse updates correctly.

A proof of concept for this is available here, highlighting the updates we want to make: https://github.com/ramapalani/sparseupdate?tab=readme-ov-file#changes-to-model-classes

  1. I have removed all the additional changes that formatter made to integTests.

Please advise in case you have any alternative suggestions against this approach.

srinivasankavitha commented 3 months ago

@krutikavk - I'm reviewing your PR and have a few questions:

  1. Can you provide an example of how this newly generated class will be used? Does a user call isFieldSet() explicitly? It is marked private so I'm unclear how as a user I will use this feature.
  2. Did you make any changes to the tests under integTests? If not, could you remove those files, since they seem to be all formatting updates and is adding a lot of noise to the PR
  1. Issue Codegen to support sparse updates #609 : The features ensures proper handling of optional fields in GraphQL Mutations. With additional fields, codegen deserialization can properly mark the fields that are not provided in a mutation so that the DGS/subgraph implementation can handle sparse updates. The feature ensures that optional fields are not overwritten with default values or null, and hence provides a more effective and desirable way of handling mutations in GraphQL.

Since object mappers in DGS framework rely on reflection, methods isSet() and setField() can stay private.

We will follow up with a more comprehensive solution working with DGS runtime as well. As a follow-up, mappers for Java and Kotlin objects will be updated to use Bitset-related fields to handle sparse updates correctly.

A proof of concept for this is available here, highlighting the updates we want to make: https://github.com/ramapalani/sparseupdate?tab=readme-ov-file#changes-to-model-classes

  1. I have removed all the additional changes that formatter made to integTests.

Please advise in case you have any alternative suggestions against this approach.

Hi @krutikavk - thanks for the follow up. Regarding this change:

Since object mappers in DGS framework rely on reflection, methods isSet() and setField() can stay private.

We will follow up with a more comprehensive solution working with DGS runtime as well. As a follow-up, mappers for Java and Kotlin objects will be updated to use Bitset-related fields to handle sparse updates correctly.

A proof of concept for this is available here, highlighting the updates we want to make: https://github.com/ramapalani/sparseupdate?tab=readme-ov-file#changes-to-model-classes

I have concerns around the proposed update to the object mappers in the DGS framework. I think have the isSet method public, would give users more flexibility to check the input values and whether they are set. This, to me, is a very useful feature indeed. However, updating the behavior of the object mappers is changing the behavior of deserialization under the hood, and not all use cases require that behavior. Exactly what functionality would go into the framework is also unclear to me. I did take a look at the linked example, so I see the implementation details of the advancedMapper , but how that will be exposed and used in the framework is unclear.

I would still prefer having the isSet be publicly exposed so it provides value regardless of whether the object mapper uses this or not. As you mentioned, we should also have a separate discussion regarding the updates you intend to make for the object mapper.

krutikavk commented 3 months ago

@krutikavk - I'm reviewing your PR and have a few questions:

  1. Can you provide an example of how this newly generated class will be used? Does a user call isFieldSet() explicitly? It is marked private so I'm unclear how as a user I will use this feature.
  2. Did you make any changes to the tests under integTests? If not, could you remove those files, since they seem to be all formatting updates and is adding a lot of noise to the PR
  1. Issue Codegen to support sparse updates #609 : The features ensures proper handling of optional fields in GraphQL Mutations. With additional fields, codegen deserialization can properly mark the fields that are not provided in a mutation so that the DGS/subgraph implementation can handle sparse updates. The feature ensures that optional fields are not overwritten with default values or null, and hence provides a more effective and desirable way of handling mutations in GraphQL.

Since object mappers in DGS framework rely on reflection, methods isSet() and setField() can stay private. We will follow up with a more comprehensive solution working with DGS runtime as well. As a follow-up, mappers for Java and Kotlin objects will be updated to use Bitset-related fields to handle sparse updates correctly. A proof of concept for this is available here, highlighting the updates we want to make: https://github.com/ramapalani/sparseupdate?tab=readme-ov-file#changes-to-model-classes

  1. I have removed all the additional changes that formatter made to integTests.

Please advise in case you have any alternative suggestions against this approach.

Hi @krutikavk - thanks for the follow up. Regarding this change:

Since object mappers in DGS framework rely on reflection, methods isSet() and setField() can stay private.

We will follow up with a more comprehensive solution working with DGS runtime as well. As a follow-up, mappers for Java and Kotlin objects will be updated to use Bitset-related fields to handle sparse updates correctly.

A proof of concept for this is available here, highlighting the updates we want to make: https://github.com/ramapalani/sparseupdate?tab=readme-ov-file#changes-to-model-classes

I have concerns around the proposed update to the object mappers in the DGS framework. I think have the isSet method public, would give users more flexibility to check the input values and whether they are set. This, to me, is a very useful feature indeed. However, updating the behavior of the object mappers is changing the behavior of deserialization under the hood, and not all use cases require that behavior. Exactly what functionality would go into the framework is also unclear to me. I did take a look at the linked example, so I see the implementation details of the advancedMapper , but how that will be exposed and used in the framework is unclear.

I would still prefer having the isSet be publicly exposed so it provides value regardless of whether the object mapper uses this or not. As you mentioned, we should also have a separate discussion regarding the updates you intend to make for the object mapper.

Thanks for the detailed feedback @srinivasankavitha!

updating the behavior of the object mappers is changing the behavior of deserialization under the hood, and not all use cases require that behavior

To elaborate further, mapToJavaObject in DGS Framework will check the presence of bitset fields i.e. sparse update support and use these fields to support sparse update as applicable.

DGS Framework object mapper will be updated like in the POC here to check presence of bitset support for sparse update.

Additionally, since bitset field is transient, it will not affect serialization of the data class. During deserialization, the bitset field will be set to default value (0) for all fields. The behavior will be unchanged if bitset fields are not present.

As advised, I have also updated the functions for accessing bitset fields isSet() and setField() to public and updated unit tests to reflect this.

paulbakker commented 3 months ago

Since object mappers in DGS framework rely on reflection, methods isSet() and setField() can stay private.

I think this feature should really be usable stand-alone, without any framework changes. E.g. in user code (in a datafetcher) you should be able to check what fields are set and act depending on that in your code. I don't like the idea to requiring reflection to make this work, or the requirement to make need framework changes for a codegen feature.

Codegen and framework have been completely independent, although obviously working nicely together, to the point that even non-DGS users are using codegen. This also ensures decoupling between both. Not every DGS user wants codegen, and I don't we should cross a line where framework "knows" about codegen.

kilink commented 3 months ago

Could we take a step back and discuss how this is intended to work end-to-end?

What it sounds like to me in this proposal is that the DefaultInputObjectMapper would need to call a private method, which I am not really in favor of. Users can supply any arbitrary class for mapping, I'd prefer not to rely on that implicit API. Furthermore, at least for mapping to Java objects, we do use setters, so I don't think the method is necessary.

I think the BitSet is overkill / premature optimization, and in practice probably performs worse than boolean fields. Input types are rarely going to have that many fields, so we've talking about a handful of boolean fields vs. an object reference to a BitSet which must be allocated, plus the generated enum. I think a better approach would be to just add boolean fields for each field in the schema type (something like fooDefined for a schema field named foo), and associated getters (isFooDefined for a schema field named foo).

Finally, there's an issue of how this will work with defaults. As far as I can tell, it will only function for input types with nullable fields; fields with default values will have the defaults populated by graphql-java before the DGS mapper is even invoked, so there's no way for us to know whether the default was supplied explicitly by the caller or not. This would only really work for nullable fields that aren't explicitly sent in the request.

Example schema:

input InputType {
  foo: String
  bar: String!
  baz: String! = "default"
  blah: String = "default-blah"
}

Example variables:

{"input": {"bar": "bar"}}

The DGS mapper is going to get a map with the following contents:

{"bar": "bar", "baz": "default", "blah": "default-blah"}

So it's too late for the mapper to actually tell that baz and blah were not explicitly sent. The only thing we could ascertain is that foo is missing. The proposed feature can only work for nullable fields without any defaults in that case, so I don't know if it makes sense to even generate boolean fields / getters for schema fields with default values.

ramapalani commented 3 months ago

@paulbakker, @kilink

Here is a POC on how this would work end to end. https://github.com/ramapalani/sparseupdate/tree/reflected-model-setBitset

This solution is inspired by one of our existing solutions. I'm not married to this solution, but we have to solve this sparse update issue, I'm open to any suggestions. If required I can set up a short Zoom call to go over this.

kilink commented 3 months ago

I'd just mention that you can do this today by just checking the DataFetchingEnvironment. E.g.,

dfe.getArgument<Map<String, Any?>>("input").containsKey("foo")

would tell you whether a null value was supplied explicitly or not.

As Paul also mentioned, I think we would not be in favor of changing the InputObjectMapper to use reflection to call a special field. I'm also hesitant to add a generated method that requires passing in an enum to check if a field is set. I think a simpler approach would be to simply add new getter methods as I mentioned before, and use boolean fields. At least for the Java mapping path, we already call setters first before attempting to set fields directly, so the generated setters would have a chance to set the flags.

srinivasankavitha commented 3 months ago

I agree with @kilink and @paulbakker.

Codegen feature on it's own is useful and I like @kilink's simplification to avoid the use of BitSet altogether since the number of fields are really not going to be that many.

For the framework, using the data fetching environment directly to determine whether a field was passed in would be ideal. You could add custom utilities to take care of that in your code without messing with the framework's inputObjectmapper.

ramapalani commented 3 months ago

Let me summarize my understanding:

Is my understanding correct? I'm ok with this approach, as it helps the developer implementing the DataFetcher to rely on @InputArgument to map object and still implement sparse update.

If you confirm, @krutikavk can update this PR and send it for your review

kilink commented 3 months ago
  • DGS runtime: DefaultInputObjectMapper shouldn't be changed to accommodate this feature, especially calling a private method using reflection
  • DGS CodeGen:

    • Avoid bitset, instead create a boolean for each field with a getter/setter for this boolean. Say if the field is foo, getter would be boolean isFieldFoo() and setter would be void setFieldFoo(boolean)
    • Individual setter field in the generated code, should call boolean setter field. Say for the field foo, its setter method would be like

Yes, I think that would be the preferred approach. My only minor quibble there is for the naming of the getter, and the need for a setter for the flag at all. A better name for the getter might be isFooDefined or isFooExplicitlyDefined or something like that. And as for the setter for the flag, I don't think it's necessary, couldn't the boolean field be set directly in the setter?

Only other things I'd consider:

kilink commented 3 months ago

Oh, one other issue, is that mapping for Kotlin types doesn't use setters at all currently. There is a ticket open for that, I think we can fix it in a follow up. I believe I have a branch that adds support for it.

krutikavk commented 3 months ago

Oh, one other issue, is that mapping for Kotlin types doesn't use setters at all currently. There is a ticket open for that, I think we can fix it in a follow up. I believe I have a branch that adds support for it.

@kilink Is there an ETA for this yet?

krutikavk commented 3 months ago

Thanks all for weighing in, I ll will update the PR for a fresh review

krutikavk commented 3 months ago

Hey @kilink @srinivasankavitha @paulbakker

Sorry for the delay, I have updated the PR with the following changes as per discussions:

  1. Generate boolean fields for Java data classes by default, removed additional flag exposed for CodeGen config
  2. Add a boolean for each field in data class called is<Field>Defined
  3. Getter for boolean called getIs<Field>Defined
  4. Setter for each field in the class explicitly sets value of is<Field>Defined
  5. Update boolean field and getter similarly for Builder class
  6. Update Builder.build() function to use setter functions for each field from data class
  7. Please note since this feature is enabled as default, most tests needed an update to account for additional fields added to data classes.

Example of data classes created:

  1. Schema
  2. Generated data types with this change for schema here

Also, I have not updated class hash function and toString() methods to reflect additional boolean fields. Would you want to expose boolean fields in these functions as well? Do let me know, I can update this shortly as needed.

Thanks!

gkesler commented 3 months ago

@srinivasankavitha

However, updating the behavior of the object mappers is changing the behavior of deserialization under the hood, and not all use cases require that behavior. Exactly what functionality would go into the framework is also unclear to me

Could you elaborate how this feature will change the deserialization behavior? The values will be read from json stream exactly as they are supposed to be read and the deserializer will set them on the target object by reflection calls of the respective mutators. The mutators along with storing field values will also raise flags that the field value was mutated. That will help to distinguish mutated field values from assigned by default, even if they are nulls.

gkesler commented 3 months ago

@kilink

What it sounds like to me in this proposal is that the DefaultInputObjectMapper would need to call a private method, which I am not really in favor of. Users can supply any arbitrary class for mapping, I'd prefer not to rely on that implicit API. Furthermore, at least for mapping to Java objects, we do use setters, so I don't think the method is necessary.

There will be absolutely NO changes to ObjectMappers. Any JSON deserializer (not only Jackson, but any other) will work with this out of the box. All changes are encapsulated into the receiving generated POJOs.

I think the BitSet is overkill / premature optimization, and in practice probably performs worse than boolean fields. Input types are rarely going to have that many fields, so we've talking about a handful of boolean fields vs. an object reference to a BitSet which must be allocated, plus the generated enum. I think a better approach would be to just add boolean fields for each field in the schema type (something like fooDefined for a schema field named foo), and associated getters (isFooDefined for a schema field named foo).

https://docs.oracle.com/javase/specs/jvms/se17/html/jvms-2.html#jvms-2.3.4 boolean value takes 1 byte. However, compilers align each value allocation to a machine word boundary, which means that on 64-bit processors booleans occupy 8 bytes. That's a lot of wasted memory.

Additionally, in our data models we are using very heavy coarse objects. Making them smaller would increase api chattiness and impact our user experiences.

Finally, there's an issue of how this will work with defaults. As far as I can tell, it will only function for input types with nullable fields; fields with default values will have the defaults populated by graphql-java before the DGS mapper is even invoked, so there's no way for us to know whether the default was supplied explicitly by the caller or not. This would only really work for nullable fields that aren't explicitly sent in the request.

null is a special case of default, isn't it? Defaults will work exactly as null. Default values (including nulls) are assigned by the object constructor, which shall not call mutators and assign values directly to the fields. So, the corresponding is<Field>Set methods will not report that these fields were mutated. During unmarshalling, deserializer will call mutators to set values from the stream. So after unmarshalling is<Field>Set methods will report that those fields were mutated. This is the heart of this approach.

kilink commented 3 months ago

There will be absolutely NO changes to ObjectMappers. Any JSON deserializer (not only Jackson, but any other) will work with this out of the box. All changes are encapsulated into the receiving generated POJOs.

The initial design absolutely involved updating the DefaultInputObjectMapper to call a special method. There's been discussion since then so the message you're replying to is out of date. There was never discussion about JSON deserialization, the term ObjectMapper is overloaded here but we were discussing the internal DGS mapper, e.g., DefaultInputObjectMapper.

https://docs.oracle.com/javase/specs/jvms/se17/html/jvms-2.html#jvms-2.3.4 boolean value takes 1 byte. However, compilers align each value allocation to a machine word boundary, which means that on 64-bit processors booleans occupy 8 bytes. That's a lot of wasted memory.

Additionally, in our data models we are using very heavy coarse objects. Making them smaller would increase api chattiness and impact our user experiences.

You can use a tool like jol to look at the layout. What you're saying I don't think is an issue here, since booleans will get packed together in the object header, so it's not as though they each occupy 8 bytes as far as I can tell. A BitSet will also occupy a pointer as well, so 8 bytes, plus the instance itself, and a BitSet has a reference to a long[]. Some of the math changes if you're using compressedOops or not but it's hard for me to see a scenario where boolean fields would be less space efficient. We are talking about objects that are typically going to have a handful of fields. There are other considerations here as well, e.g., with respect to CPU performance, the BitSet is going to always be worse here, there are just not enough items to justify needing an array of longs, the added pointer chasing, etc. And finally the main issue is with readability and maintainability, using boolean fields is just simpler to maintain, especially with respect to the original proposal that also generated an inner enum class. I'd want to see real data proving that it is causing a performance issue before resorting to such an optimization.

null is a special case of default, isn't it? Defaults will work exactly as null. Default values (including nulls) are assigned by the object constructor, which shall not call mutators and assign values directly to the fields. So, the corresponding isSet methods will not report that these fields were mutated. During unmarshalling, deserializer will call mutators to set values from the stream. So after unmarshalling isSet methods will report that those fields were mutated. This is the heart of this approach.

How would they be assigned by the object constructor? I don't know if we're on the same page if you're referring to deserialization and unmarshalling, that doesn't come into the picture here at all. DGS leverage graphql-java to deserialize the JSON request, and for variables or arguments they get converted to a Map before the DGS even touches them. The only thing that comes into play at that layer is perhaps any custom Coercing implementations registered. DGS then maps the Map<String, ?> we get from graphql-java to a concrete type based on the parameters for the data fetcher methods, that happens in DefaultInputObjectMapper today.

For mapping to Java objects, today that works by invoking a no-args constructor and all fields are populated via setters.

gkesler commented 3 months ago

The initial design absolutely involved updating the DefaultInputObjectMapper to call a special method. There's been discussion since then so the message you're replying to is out of date. There was never discussion about JSON deserialization, the term ObjectMapper is overloaded here but we were discussing the internal DGS mapper, e.g., DefaultInputObjectMapper.

Yes, I was not exactly correct. GraphQL-Java first deserializes input into a map and then InputObjectMapper converts it to a POJO. Pretty much the same, what ObjectMapper.convertValue is doing. The point is that this solution works in both cases.

How would they be assigned by the object constructor? I don't know if we're on the same page if you're referring to deserialization and unmarshalling, that doesn't come into the picture here at all. DGS leverage graphql-java to deserialize the JSON request, and for variables or arguments they get converted to a Map before the DGS even touches them. The only thing that comes into play at that layer is perhaps any custom Coercing implementations registered. DGS then maps the Map<String, ?> we get from graphql-java to a concrete type based on the parameters for the data fetcher methods, that happens in DefaultInputObjectMapper today.

For mapping to Java objects, today that works by invoking a no-args constructor and all fields are populated via setters.

Thanks for correction, yes, with the above comments the mapping occurs from the map deserialized by GraphQL-Java.

kilink commented 3 months ago

Oh, one other issue, is that mapping for Kotlin types doesn't use setters at all currently. There is a ticket open for that, I think we can fix it in a follow up. I believe I have a branch that adds support for it.

@kilink Is there an ETA for this yet?

There's no ETA as of yet, a lot of focus has been on getting the spring-graphql integration out, now that that is done it will be easier to make progress on other items. I do think there is some nuance here, since the current generated kotlin classes rely on default arguments. I'm fine with proceeding with just supporting this for Java at the moment and expanding support for kotlin generated classes later, although I don't know if that works for your codebase?

krutikavk commented 3 months ago

ticket open

Oh, one other issue, is that mapping for Kotlin types doesn't use setters at all currently. There is a ticket open for that, I think we can fix it in a follow up. I believe I have a branch that adds support for it.

@kilink Is there an ETA for this yet?

There's no ETA as of yet, a lot of focus has been on getting the spring-graphql integration out, now that that is done it will be easier to make progress on other items. I do think there is some nuance here, since the current generated kotlin classes rely on default arguments. I'm fine with proceeding with just supporting this for Java at the moment and expanding support for kotlin generated classes later, although I don't know if that works for your codebase?

Sounds good @kilink! For Kotlin code generator, if there a specific issue and a solution identified, I'd be open to contribute adding explicit setters. This will especially help us contribute to this feature in entirety. Love to hear your thoughts.

I will also get back shortly with an analysis of memory utilization and any potential performance impact for both potential solutions

krutikavk commented 3 months ago

Hi @kilink @srinivasankavitha

Here's some updates on the discussion for this solution so far. DO share your thoughts and best way to move forward with the change.

  1. I worked on analyzing both potential solutions (Bitset and Boolean) to store field presence using JOL.

TLDR; Bitset is more efficient in utilizing heap memory over Boolean. This is evident for classes having a huge number of fields (I compared with classes having 3 fields vs 33 fields). Median application size we are looking at has about 288 fields, this disparity may be significant for applications this size, hence the inclination for a Bitset-based solution.

Please refer to this for a deeper look at Java object and the classes. Do share your thoughts on this.

  1. Based on previous feedback, acknowledge that this feature should be updated without updating DGS runtime. Similar to current Boolean field presence solution as discussed here, only setters can be updated with the Bitset solution as DGS runtime already uses these. Adding an enum to Bitset solution does add a level of operability where code owners don't have to hard code ordinals for each field in the Bitset.

In case there are additional views for discussions, I'd be happy to facilitate a short meeting for working out the best way forward with this feature.

Once confirmed, I can also update the PR with the best fit solution. Thanks!

cc: @ramapalani @gkesler

kilink commented 2 months ago

TLDR; Bitset is more efficient in utilizing heap memory over Boolean. This is evident for classes having a huge number of fields (I compared with classes having 3 fields vs 33 fields).

Even with 33 fields using boolean fields yields a smaller object footprint (184 bytes vs 200 bytes). Can you share how you arrived at it being worse for 33 fields? You need to look at the object graph and include the overhead of allocating a BitSet as well. It seems to me that using a BitSet would only be smaller when you approach 60 fields or so. I think we are wasting time even bothering to look at this, unless you know it's an issue in practice. If it is, please share the profile data that demonstrates it is an issue. As it is, we're talking about a different of a few bytes in extreme cases, if the application is that memory constrained that this matters you are likely to find much better places to optimize elsewhere.

Median application size we are looking at has about 288 fields, this disparity may be significant for applications this size, hence the inclination for a Bitset-based solution.

That would have been useful to know earlier on in the discussion. That said, the framework has to support all users, not just your specific use case; that many fields is likely far from typical. Memory usage is only one dimension, there are other concerns such as maintainability of dgs-codegen or CPU utilization. Do you have evidence to suggest that it matters in practice, such as profile data?

Taking a step back, are you saying you have a schema with 288 fields that are nullable, and none of them have defaults? These flags are only useful for nullable fields with no defaults, as I suggested before we can just generate the fields and getters for the subset of schema fields where it matters.

krutikavk commented 2 months ago

TLDR; Bitset is more efficient in utilizing heap memory over Boolean. This is evident for classes having a huge number of fields (I compared with classes having 3 fields vs 33 fields).

Even with 33 fields using boolean fields yields a smaller object footprint (184 bytes vs 200 bytes). Can you share how you arrived at it being worse for 33 fields? You need to look at the object graph and include the overhead of allocating a BitSet as well. It seems to me that using a BitSet would only be smaller when you approach 60 fields or so. I think we are wasting time even bothering to look at this, unless you know it's an issue in practice. If it is, please share the profile data that demonstrates it is an issue. As it is, we're talking about a different of a few bytes in extreme cases, if the application is that memory constrained that this matters you are likely to find much better places to optimize elsewhere.

@kilink Here is a rough analysis of Boolean vs Bitset solution: https://github.com/krutikavk/sparse-update-analysis Bitset uses 152 bytes while Boolean uses 184 bytes.

This article was an informative jumping-off point: https://www.baeldung.com/java-boolean-array-bitset-performance#:~:text=In%20terms%20of%20cardinality%20throughput,to%20the%20corresponding%20boolean%5B%5D

Median application size we are looking at has about 288 fields, this disparity may be significant for applications this size, hence the inclination for a Bitset-based solution.

That would have been useful to know earlier on in the discussion. That said, the framework has to support all users, not just your specific use case; that many fields is likely far from typical. Memory usage is only one dimension, there are other concerns such as maintainability of dgs-codegen or CPU utilization. Do you have evidence to suggest that it matters in practice, such as profile data?

Let me gather more information here, we have seen performance-related issues with the Boolean fields solution and have moved to a Bitset-based one.

kilink commented 2 months ago

@kilink Here is a rough analysis of Boolean vs Bitset solution: https://github.com/krutikavk/sparse-update-analysis Bitset uses 152 bytes while Boolean uses 184 bytes.

That is inaccurate, since you are only looking at the class layout. The BitSet itself has to be allocated, and if you account for that it comes out worse.

kilink commented 2 months ago

This article was an informative jumping-off point: https://www.baeldung.com/java-boolean-array-bitset-performance#:~:text=In%20terms%20of%20cardinality%20throughput,to%20the%20corresponding%20boolean%5B%5D

That article uses a boolean array, so has extra overhead that we wouldn't have, and in its conclusion even says exactly what I've been trying to convey, that a BitSet does not perform better when talking about such a low cardinality of items.

kilink commented 2 months ago

I'll re-iterate, I don't think this discussion is productive, it boils down to quibbling over a few bytes overhead in extreme cases. If it's somehow a bottleneck or performance issue for you, then please share some profiling information. This feature hasn't even been released so it seems premature to be so focused on such an optimization, let's worry about implementing it in the simplest way possible first and foremost. If somehow it goes out and is a problem, then we can discuss optimizations.

I'll ask again about your specific use case, the input object with 288 fields, are all fields nullable with no defaults? If they're not, then you can also reduce the number of boolean fields needed by only generating the fields for GraphQL fields that need them (it's useless to have a field / getter for a non-nullable field or one with a default value).

krutikavk commented 2 months ago

@kilink @srinivasankavitha @paulbakker Thank you for your feedback and understanding. We agree with proceeding with your suggested solution with Boolean fields in the interest of simplicity and functionality at this stage.

As we move forward, we'll definitely want to revisit this in a future PR to explore further optimizations. To ensure we're making informed decisions, appreciate if you can share what additional profiling information would be help us propose a more optimized solution.

Additional question I had as follow-up once this PR is merged:

  1. I understand Maven codegen uses dgs-codegen-core. Will I need to create an issue or PR to update maven codegen as well?
  2. For Kotlin codegen, what would be the best way forward for delivering the same feature?
srinivasankavitha commented 2 months ago

@kilink @srinivasankavitha @paulbakker Thank you for your feedback and understanding. We agree with proceeding with your suggested solution with Boolean fields in the interest of simplicity and functionality at this stage.

As we move forward, we'll definitely want to revisit this in a future PR to explore further optimizations. To ensure we're making informed decisions, appreciate if you can share what additional profiling information would be help us propose a more optimized solution.

Additional question I had as follow-up once this PR is merged:

  1. I understand Maven codegen uses dgs-codegen-core. Will I need to create an issue or PR to update maven codegen as well?
  2. For Kotlin codegen, what would be the best way forward for delivering the same feature?

Thanks for all the work so far in the PR and the providing more context for us to understand better. For (1) - we do not maintain the Maven plugin as part of this project. It is maintained separately here : https://github.com/deweyjose/graphqlcodegenhttps://github.com/michaelboyles/dgs-codegen-maven-plugin

For(2) - we could probably open a new discussion/issue and iterate once we tackle https://github.com/Netflix/dgs-framework/issues/1057.

krutikavk commented 2 months ago

@kilink @srinivasankavitha @paulbakker Thank you for your feedback and understanding. We agree with proceeding with your suggested solution with Boolean fields in the interest of simplicity and functionality at this stage. As we move forward, we'll definitely want to revisit this in a future PR to explore further optimizations. To ensure we're making informed decisions, appreciate if you can share what additional profiling information would be help us propose a more optimized solution. Additional question I had as follow-up once this PR is merged:

  1. I understand Maven codegen uses dgs-codegen-core. Will I need to create an issue or PR to update maven codegen as well?
  2. For Kotlin codegen, what would be the best way forward for delivering the same feature?

Thanks for all the work so far in the PR and the providing more context for us to understand better. For (1) - we do not maintain the Maven plugin as part of this project. It is maintained separately here : https://github.com/deweyjose/graphqlcodegenhttps://github.com/michaelboyles/dgs-codegen-maven-plugin

For(2) - we could probably open a new discussion/issue and iterate once we tackle Netflix/dgs-framework#1057.

Thanks @srinivasankavitha. Would you have a timeline by when this PR can be merged? I see some of the CI tests may need updates with the default fields added. Please let me know if there are any more changes from my end.

srinivasankavitha commented 2 months ago

Thanks @krutikavk - are you able to fix the tests as well? These are failing due to the changes in this PR, correct? We can merge once the build is green

krutikavk commented 2 months ago

Hi @kilink @srinivasankavitha Apologies for the follow-up, I wanted to check in on the PR review. If there were any concerns or questions, I am happy to address them. Thanks for your patience and understanding!

krutikavk commented 2 months ago

Hi @kilink @srinivasankavitha Apologies for the follow-up, I wanted to check in on the PR review. If there were any concerns or questions, I am happy to address them. Thanks for your patience and understanding!

I see the CI build just failed at stage "Build Examples". Can you share any pointers if this needs to be fixed on the PR and how I can run the CI test locally for validating CI builds?

srinivasankavitha commented 2 months ago

Both examples are here: https://github.com/Netflix/dgs-examples-java-2.7 https://github.com/Netflix/dgs-examples-kotlin-2.7 You can run things locally with those examples and applying the codegen plugin from your branch.

krutikavk commented 2 months ago

Thanks @srinivasankavitha I ran through the examples and dropped in a quick fix adding conditional boolean field generation for input type values as well.

The build failure message is as below:

ERROR:Unable to test /home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7, command '/home/runner/work/dgs-codegen/dgs-codegen/gradlew -p /home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7 -c /home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7/settings.gradle.kts clean check' failed! Command '['/home/runner/work/dgs-codegen/dgs-codegen/gradlew', '-p', '/home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7', '-c', '/home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7/settings.gradle.kts', 'clean', 'check']' returned non-zero exit status 1.

I am wondering if this fix will fix the test as well--I a m unsure what test is failing atm ^ as sent above. Can you please retry the build and guide if I need to do anything else to fix like updating expected values etc.?

srinivasankavitha commented 2 months ago

Thanks @srinivasankavitha I ran through the examples and dropped in a quick fix adding conditional boolean field generation for input type values as well.

The build failure message is as below:

ERROR:Unable to test /home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7, command '/home/runner/work/dgs-codegen/dgs-codegen/gradlew -p /home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7 -c /home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7/settings.gradle.kts clean check' failed! Command '['/home/runner/work/dgs-codegen/dgs-codegen/gradlew', '-p', '/home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7', '-c', '/home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7/settings.gradle.kts', 'clean', 'check']' returned non-zero exit status 1.

I am wondering if this fix will fix the test as well--I a m unsure what test is failing atm ^ as sent above. Can you please retry the build and guide if I need to do anything else to fix like updating expected values etc.?

This seems to have fixed the build now, right? I see the CI build is green.

krutikavk commented 2 months ago

Thanks @srinivasankavitha I ran through the examples and dropped in a quick fix adding conditional boolean field generation for input type values as well. The build failure message is as below: ERROR:Unable to test /home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7, command '/home/runner/work/dgs-codegen/dgs-codegen/gradlew -p /home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7 -c /home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7/settings.gradle.kts clean check' failed! Command '['/home/runner/work/dgs-codegen/dgs-codegen/gradlew', '-p', '/home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7', '-c', '/home/runner/work/dgs-codegen/dgs-codegen/build/examples/dgs-examples-java-2.7/settings.gradle.kts', 'clean', 'check']' returned non-zero exit status 1. I am wondering if this fix will fix the test as well--I a m unsure what test is failing atm ^ as sent above. Can you please retry the build and guide if I need to do anything else to fix like updating expected values etc.?

This seems to have fixed the build now, right? I see the CI build is green.

@srinivasankavitha Yes I was just able to validate this locally as well, this fixed the build. Please help validate and merge the PR.

krutikavk commented 2 months ago

@kilink Thanks for reviewing, I have updated as per your feedback. Please check if the changes are good

krutikavk commented 2 months ago

Hi @kilink @srinivasankavitha Checking in if you need anything from me before the PR can be merged

srinivasankavitha commented 2 months ago

Hi @krutikavk - we are out this week. We will take a look next week.

ramapalani commented 2 months ago

@srinivasankavitha following up on this. Will you be able to review this PR? And let us know if you need any further changes to the PR

ramapalani commented 2 months ago

@srinivasankavitha Thanks for approving. What is the next step for this PR? Will you be able to merge this or do you require additional approvals?