doanduyhai / Achilles

An advanced Java Object Mapper/Query DSL generator for Cassandra
http://achilles.archinnov.info
Apache License 2.0
241 stars 92 forks source link

basic support of contructor instantiation #307

Closed rmannibucau closed 6 years ago

rmannibucau commented 7 years ago

Overall idea is to introduce a @Factory(/list of field names/) you can put on a constructor to create immutable models.

See ImmutableEntity for a sample.

API is inpired from what is commonly done in the industry (like Jackson/Johnzon/Java with @ConstructorProperties etc...). Alternative I had in mind was to add on @Column a constructorIndex but I like it less cause it requires user to fully control it (including in inheritance) which can be way more complex when it will start to fail after a refactoring.

In term of implementation the constructor propertles are added to the meta model and removed from the "all properties" list which represents the "settable" properties.

One impacting change is the refactoring of BeanFactory which now takes a constructor and list of arguments. In term of feature it provides more (= it includes the old capabilities through the constructor API) but it is a breaking change. To mitigate it we can keep the old BeanFactory and introduce a ParameterAwareBeanFactory and kind of bridge both in the implementation but I'm not sure it does worth the effort (how often is it overriden?).

Please let me know what you think about it and if it can be integrated as a feature to achilles.

doanduyhai commented 7 years ago

Did you read this issue ? https://github.com/doanduyhai/Achilles/issues/264

We had an extensive discussion about support for immutable entities with all the strategies and pros & cons

rmannibucau commented 7 years ago

Didn't read it - good catch!

There are few points which are interesting:

Another open question is if java field names should be used or cql names. Most of mappers use the java field name to "stay java" but the cql would have the advantage to enable achilles to have multiple @EntityCreator and use an exact matching strategy based on the query (which columns were retrieved).

Last question I had was if static factories/methods returning an entity instance in the entity itself should be decorable with @EntityCreator (I don't see any blocker to do it).

doanduyhai commented 7 years ago

you can mix constructor and setters for the loading --> I'm OK with this approach only if injection by setters are only available for non-mapped fields (e.g. fields with no @Column annotation or with @Transient annotation). All managed fields should be injected by constructor. Let's keep the impl simple, we can always extend it with more features after this issue is solved.

the @column in the constructor was a question I had but it was leading to having potentially a lot of constructor parameter annotations (like codecs etc) which makes the constructor poorly readable --> Agree, I also think that forcing users to put @Column on every constructor param is very boilerplate and annoying

So I would propose the following rules:

  1. With regard to constructors a. when a single non-default constructor is found, use it b. when multiple non-default constructors are found, use the one with @EntityCreator or exception if annotation not found
  2. For each mapped field (having @Column or not @Transient if implicit mapping strategy is used) a. Require them to be present in the @EntityCreator constructor and the parameter name should match the field name as well as their type. Example:

           @Table
          public class MyEntity {
                @PartitionKey 
                public final UUID id;
    
                @Column
                public final String value;
    
                // Correct declaration of construction injection
                @EntityCreator
                public MyEntity(UUID id, String value) {
                      this.id = id;
                      this.value = value;
                }
    
                // Incorrect declaration of construction injection
                // myId parameter name does not match the field "id" name
                @EntityCreator
                public MyEntity(UUID myId, String value) {
                      this.id = myId;  
                      this.value = value;
                }
    
                // Incorrect declaration of construction injection
                // id parameter type (Longà does not match the field "id" type (UUID)
                @EntityCreator
                public MyEntity(Long id, String value) {
    
                      this.value = value;
                }
         }

    b. Require them to have public final modifiers and not getter/setter

For the implementation there is an existing BeanFactory that is correctly decoupled from the other classes. I think it would make sense to create another ImmutableBeanFactory and pass the responsibility of instantiation by constructor injection to this class.

In any case, we need extra metadata for constructor injection

What do you think @rmannibucau ?

dblevins commented 7 years ago

On "the parameter name should match the field name as well as their type" the tragic limitation of Java over the last 20 years is the parameter names of methods and constructors is compiled away, not in the bytecode*, and unavailable at runtime via reflection or any means.

I put a * there as there is a "trick" you can use if the class was compiled with debug information. The debug table can be inspected via a bytecode tool like ASM to find the parameter names. I can point out code for that if we think it's attractive as a convenience. Most people do compile with debug on. The JVM for, example, ships with it off which is why you get "Information unavailable" in your IDE. This is something you can know by looking at the class once, if it is there you can rely on it.

That trick aside, the next tempting trick would be to match the order of the fields with the order of the constructor parameters. Unfortunately, field and methods as reported via reflection are not in a guaranteed order either. They will appear that way in most of the time, but it's not a guarantee and you will eventually get bit.

From here there are three options that will always work.

List the parameter names in @EntityCreator

@Table
public class MyEntity {
    @PartitionKey
    public final UUID id;

    @Column
    public final String value;

    // Correct declaration of construction injection
    @EntityCreator({"id", "value"})
    public MyEntity(UUID id, String value) {
        this.id = id;
        this.value = value;
    }

    // Correct declaration of construction injection
    @EntityCreator({"id", "value"})
    public MyEntity(UUID myId, String value) {
        this.id = myId;
        this.value = value;
    }

    // Incorrect declaration of construction injection
    // myId parameter name does not match the field "id" name
    @EntityCreator({"myId", "value"})
    public MyEntity(UUID myId, String value) {
        this.id = myId;
        this.value = value;
    }

    // Incorrect declaration of construction injection
    // insufficient number of parameter names supplied
    @EntityCreator({"id"})
    public MyEntity(UUID myId, String value) {
        this.id = myId;
        this.value = value;
    }

    // Incorrect declaration of construction injection
    // id parameter type (Longà does not match the field "id" type (UUID)
    @EntityCreator
    public MyEntity(Long id, String value) {

        this.value = value;
    }
}

Put the annotations on the constructor parameters.

This was discussed as noisy. It is, unfortunately. And since there are no parameter names available in the bytecode, it's a bit noisier than imagined.

@Table
public class MyEntity {
    public final UUID id;

    public final String value;

    // Correct declaration of construction injection
    @EntityCreator
    public MyEntity(@PartitionKey UUID id, @Column("value") String value) {
        this.id = id;
        this.value = value;
    }
}

Ultimately this is less noisy than a builder pattern which requires an additional class, but that could work too.

Allow final fields and handle it like Serialization does

It's not widely known, but you actually set a final field in Java. Final fields were imagined as something that would provide compile time and runtime benefits. It didn't turn out that way and they are largely compile-time in benefit, the JVM sees them as mutable.

This knowledge is used in a major Java feature, object serialization. You can serialize and deserialize objects with final fields. The JVM pulls it off using sun.misc.Unsafe to set fields during deserialization, it does not use reflection.

Here's a class that is ASL 2.0 and shows how:

Unsafe looks like a non-portable class, but it is a de-facto standard Java, supported by all JVMs, and used by countless libraries. There are several major products that have their "one cool feature" tied to it, such as Hazelcasts Off-Heap memory storage. Oracle tried to take it away in Java 9 and the whole industry revolted. As of Java 9 Unsafe has a new package jdk.internal.misc.Unsafe to make it look vendor neutral, but still make it clear it is a low-level internal class.

doanduyhai commented 7 years ago

On "the parameter name should match the field name as well as their type" the tragic limitation of Java over the last 20 years is the parameter names of methods and constructors is compiled away, not in the bytecode*, and unavailable at runtime via reflection or any means. --> You're right but since we're using annotation processors, we can have access to the source code and thus parameter names, don't we ?

Other than that, I really like your idea of declaring parameter names on the @EntityCreator annotation, it's way less verbose than having @Column on every parameter.

I'm going to investigate to check if we can get parameter name easily with annotation processors. Worst case we can fallback on using @EntityCreator with declared parameter names

doanduyhai commented 7 years ago

A quick check shows that with annotation processors we can access parameter name as well as their type:


final List<ExecutableElement> executableElements = ElementFilter.constructorsIn(aptUtils.elementUtils.getAllMembers(elm));
final ExecutableElement constructor = executableElements.get(0);
final VariableElement parameter = constructor.getParameters().get(0);

final String parameterName = parameter.getSimpleName().toString();
final TypeMirror parameterType = parameter.asType();
rmannibucau commented 7 years ago

Quickly on the parameter name thing: since java 8 you can access it by reflection using new Parameter API (needs a flag at compile time to not default to arg0...). So processor or not it is ok. However - as @Factory in this PR - it is nice to be able to pass it explicitly (we can use processor if not done - since it is common to not match param names and field names.

Now on limitations: I dont understand why forcing final or to have all mapped fields in the constructor. If you check the code doing or not doesnt change much so it is on the best practise side of things and not a code complexity thing.

On using or not the constructor: to avoid a user to define one by pojo convenience and get it used im tempted to keep the annotation mandatory except for noarg one (like today case).

Last thing is ImmutableBeanFactory is not that great cause wouldnt allow to override easily in all cases - constructor or not - the impl. I tend to think BeanFactory abstraction is maybe not the right level and desire is more a postconstruct hook which would be the same for both case. Wdyt? Another feature (other pr probably) would be the support of creator annotation on a static (or not?) method which would enable achilles to get rid of the bean factory api :).

rmannibucau commented 7 years ago

renamed @Factory to comply to the discussion (@EntityCreator) + added support of implicit parameter names if not set by the user

doanduyhai commented 7 years ago

Now on limitations: I dont understand why forcing final or to have all mapped fields in the constructor. --> Because that is what immutable entity means.

Final fields is crucial for multi-threading. Since the fields are final the entity is sure to be immutable and can be used in an multi-threading context without any synchronization

Last thing is ImmutableBeanFactory is not that great cause wouldnt allow to override easily in all cases - constructor or not - the impl --> In that case, make AbstractImmutableBeanFactory an interface with default methods so that you can extend it easily.

I tend to think BeanFactory abstraction is maybe not the right level and desire is more a postconstruct hook which would be the same for both case --> If what you need is postconstruct to invoke custom logics, why don't you use Interceptors ? They have been designed for this.

rmannibucau commented 7 years ago

Yep but why enforcing immutable mapping where it doesnt change anything for the runtime? Code would be the same and user can rely on it or not and use the pattern he desires - message passing or builder one.

Abstracting the bean factory doesnt help since if achilles own the constructor then BeanFactory API is broken by design. What was the original goal? Injections? If so postconstruct api is maybe saner or interceptors. If beanfactory is overkill let s ignore it and deprecate it.

doanduyhai commented 7 years ago

Yep but why enforcing immutable mapping where it doesnt change anything for the runtime? -->

Didn't you want to have immutable models originally ?

I quote

Overall idea is to introduce a @factory(/list of field names/) you can put on a constructor to create immutable models.

For me, immutable models means:

Abstracting the bean factory doesnt help since if achilles own the constructor then BeanFactory API is broken by design. What was the original goal? Injections? If so postconstruct api is maybe saner or interceptors. If beanfactory is overkill let s ignore it and deprecate it. --> The idea of BeanFactory was to decouple instanciation of entities (new MyEntity(...)) from object mapping (decoding Cassandra columns from each row into field instances for the entity) and codec transformation.

That's a clear separation of responsibility. BeanFactory just create the instance and then passes it to other services for Object Mapping. Of course if we have an ImmutableBeanFactory, the object mapping part should be done BEFORE calling the factory since we need to feed all the field values at entity construction

doanduyhai commented 7 years ago

Another remark, when implementing Immutable models, it is preferable to perform the parsing of constructor at compile time using annotation processor and not at runtime using reflection. Why ?

  1. For the sake of consistency of code design. All Achilles parsing & model validation is done at compile time, it would be a code smell to have some parsing now delegated at runtime. Maintaining it would be much harder

  2. In general, you get more information on the source code using annotation processor than using reflection, even though since Java 8 there is a lot of progress from this point. Still, nested generics like Map<String, List<Integer>> is a nightmare to parse using reflection whereas it's a piece of cake with annotation processors

rmannibucau commented 7 years ago

Then beanfactory is not an extension and can be broken right?

My immutable need was an obvious example but just a very small subpart of the constructor instantiation need (see jackson samples for instance)

doanduyhai commented 7 years ago

I'm having a detailed look into your PR code, let me some time to finish it and have a global status

rmannibucau commented 7 years ago

Awesome. Let me know if i need to rework the filtering of constructor/setter properties.

dblevins commented 7 years ago

All Achilles parsing & model validation is done at compile time

If that's the case, excellent. Sky is the limit at compile time. The runtime world presents only a handful of bad options in this regard.

doanduyhai commented 7 years ago

Ok @rmannibucau I've looked into detailed the PR, I think there was a misunderstanding from my side. In fact there are clearly 2 distinct requirements:

  1. Creating new entity instance using non-default constructor annotated with @EntityCreator

  2. Implementing an immutable entity with all the constraints in term of final field, no getter/setter ...

As far as I understand, this PR is about point 1). OK

Now more into the impl detail, you chose to rely on Reflection (java.lang.reflect.Constructor) to create an instance of the entity with appropriate parameters (constructor.newInstance(...)). Although it is feasible it forces us to rely on runtime reflection as well as using a lot of classes/API from the java.lang.reflect package (for example, the AsbtractEntityProperty now has a java.lang.reflect.Constructor field).

Furthermore, the access to the constructor is done by reflection for every new instance of the entity:


  // This constructor discovery mechanism is done for EVERY new instance of the entity
  @Override
   protected Constructor<TestEntityWithComplexIndices> getConstructor() {
     return Stream.of(getEntityClass().getConstructors())
                 .filter(x -> x.isAnnotationPresent(EntityCreator.class)).findFirst()
                 .map(Constructor.class::cast)
                 .orElseGet(() -> {
                     try {
                         return Constructor.class.cast(getEntityClass().getConstructor());
                     } catch (NoSuchMethodException e) {
                         throw new IllegalArgumentException("Invalid class " + getEntityClass().getName());
                     }
                  });
  }

My suggestion is to not use java.lang.reflect.Constructor for instanciation but generate the source code right away to create new instance of the entity inside the generated class MyEntity_AchillesMeta.

For example, with normal instanciation by default constructor:


public final class MyEntity_AchillesMeta {

  ...

  //Generated method
  public MyEntity newInstance() {
    return new MyEntity();
  }
}

Instanciation with custom constructor:

public class MyCustomEntity {

   @PartitionKey
   private UUID id;

   @ClusteringColumn
   private Date date;

   @Column
   private String value;

    //Default constructor
    public MyCustomEntity() {};

    @EntityCreator
    public MyCustomEntity(UUID id, Date date) {
      this.id = id;
      this.date = date;
    }
}

public final class MyCustomEntity_AchillesMeta {

  ...

  //Generated method by annotation processor !!!!
  public MyCustomEntity newInstance(Object[] parameters) {
    final UUID id = (UUID)parameters[0];
    final Date date = (Date)parameters[1];
    return new MyEntity(id, date);
  }
}

Please remark the manual type-casting from Object to the appropriate type. Since we get all the type information at compile time, it is safe to perform such casting.

This also means that we should remove completely the BeanFactory interface and impls since instanciation code is now generated directly in metadata class

Now, for the method AbstractEntityProperty.createEntityFrom():

public T createEntityFrom(Row row) {
        if (LOGGER.isDebugEnabled()) {
            LOGGER.debug(format("Create entity of type %s from Cassandra row %s",
                    entityClass.getCanonicalName(), row));
        }
        if (row != null) {
             if(hasCustomConstructor()) {
                T newInstance = this.newInstance();
                 // Plug-in existing code
             } else {
                  Object[] constructorInjectedFields = constructorProperties
                    .stream()
                    .map(x -> x.decodeFromGettable(row))
                    .toArray(Object[]::new));     

                T newInstance = this.newInstance(constructorInjectedFields);

                //setter now
                ...
            }
            return newInstance;
        }
        return null;
    }

To avoid declaring manually fieldnames on @EntityCreator annotation, I would prefer we require that the parameter name should match the field name as well as their type. The check would be done at compile time and we raised error message. I feel this is not a big constraint on user and anyway most of developers get their constructor args generated by IDEs and the parameter name usually match the field name anyway.

WDYT ?

rmannibucau commented 7 years ago

Yep, point 2 being up to the dev.

constructor is looked up once then cached so runtime overhead is ~ null and BeanFactory still can be useful but if we drop this API + to generate the "factory" it avoids the Object[] overhead so I'm fine with both options.

About param names I saw weird but used (as "in several projects") conventions so being able to override default names is nice I think.

doanduyhai commented 7 years ago

About param names I saw weird but used (as "in several projects") conventions so being able to override default names is nice I think.

--> I'm not strongly opinionated on this. So to recap:

  1. Either @EntityCreator without any value on the annotation, in this case parameter name should match field name. This will represent 99% of the use-cases

  2. Or users want to have different parameter names, in this case they should declare all parameter names in the annotation @EntityCreator("param1", "param2", ..., "paramN")

rmannibucau commented 7 years ago

With the brackets for the array yes ;). This is what should be on the pr. Probably needs a test for default case though.

doanduyhai commented 7 years ago

And what's about my suggestion not using java.lang.reflection API ? I found it much clearer and readable to have

public MyCustomEntity newInstance(UUID id, Date date) {
    return new MyEntity(id, date);
  }

rather than all the reflection and filtering on constructors to find the good one. Since we're parsing at compile time, let's use generated code as well for entity instanciation.

Now you will ask me how you can extract the UUID and Date exact type from a List<AbstractProperty<MyEntity, ?, ?>>, my answer would be don't use List<AbstractProperty<MyEntity, ?, ?>> constructorProperties, rather you have each column metadata accessible in the MyEntity_AchillesMeta as field so use them directly:

public final class MyEntity_AchillesMeta {

    //Generated
    public static final SimpleProperty<MyEntity, UUID, UUID> id = ...

    //Generated
    public static final SimpleProperty<MyEntity, Date, date> date = ...

}

Throw an eye at some generated metadata classes in the integration_test_21 sub-module to have a good idea about the internal layout of each metadata class

rmannibucau commented 7 years ago

This one was awaiting a "we will drop BeanFactory completely" from your side. If not i think it is bad to have it sometimes used and sometimes not, if yes Im all for it. About meta i think it is already the case in the pr. Just settable fields are filtered removing constructor ones when preparing the runtime in meta constructor, no?

Side note: found more understandable this impl but no blocker to precompute both list without interaction at all

doanduyhai commented 7 years ago

You can drop the BeanFactory completely if you're generating method to instantiate the entities inside metadata class.

doanduyhai commented 7 years ago

About meta i think it is already the case in the pr. Just settable fields are filtered removing constructor ones when preparing the runtime in meta constructor, no? --> No, you still use java.lang.reflect.Constructor and have the method getConstructor() in the PR.

If we use 100% generated code, we don't even need reference to Constructor object, we can just generate a new MyEntity(...) directly in the source code

rmannibucau commented 7 years ago

Yep. Will drop the constructor when ill get time - hopefully before next week.

rmannibucau commented 7 years ago

Code updated for review

doanduyhai commented 7 years ago

Code review done