FasterXML / jackson-dataformats-text

Uber-project for (some) standard Jackson textual format backends: csv, properties, yaml (xml to be added in future)
Apache License 2.0
408 stars 149 forks source link

CsvMapper applies Property Order alphabetically for Java 16 Records #264

Open alexpartsch opened 3 years ago

alexpartsch commented 3 years ago

I'm trying to use CsvMapper to parse a Java 16 Record out of a CSV file. While testing it I've encountered some strange behaviour:

It seems CsvMapper.schemaFor(AnyRecord.class).withHeader() always returns a schema where the columns are assumed to be sorted alphabetically by name and the header row is not considered.

Here's an example:

public record TestRecord(String c, String b, String a) {
}

Running the JUnit 5 test:

    @Test
    void testRecord() throws JsonProcessingException {
        CsvMapper csvMapper = new CsvMapper();
        var schema = csvMapper.schemaFor(TestRecord.class).withHeader();
        var csv = """
                c,b,a
                C,B,A""";
        TestRecord tr = csvMapper.readerFor(TestRecord.class).with(schema).readValue(csv);
        Assertions.assertEquals(new TestRecord("C", "B", "A"), tr);
    }

Will fail with:

expected: <TestRecord[c=C, b=B, a=A]> but was: <TestRecord[c=A, b=B, a=C]>
Expected :TestRecord[c=C, b=B, a=A]
Actual   :TestRecord[c=A, b=B, a=C]

So, the parsed instance of TestRecord was initialised in the wrong order.

Doing the same thing with a class:

import java.util.Objects;

public class TestClass {

    public String a;
    public String b;
    public String c;

    public TestClass(String a, String b, String c) {
        this.a = a;
        this.b = b;
        this.c = c;
    }

    public TestClass() {
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        TestClass testClass = (TestClass) o;
        return Objects.equals(a, testClass.a) && Objects.equals(b, testClass.b) && Objects.equals(c, testClass.c);
    }

    @Override
    public int hashCode() {
        return Objects.hash(a, b, c);
    }
}

Running the test:

    @Test
    void testClass() throws JsonProcessingException {
        CsvMapper csvMapper = new CsvMapper();
        var schema = csvMapper.schemaFor(TestClass.class).withHeader();
        var csv = """
                c,b,a
                C,B,A""";
        TestClass tr = csvMapper.readerFor(TestClass.class).with(schema).readValue(csv);
        Assertions.assertEquals(new TestClass("C", "B", "A"), tr);
    }

Which passes fine!

Having a record where the properties are ordered alphabetically by name:

public record TestRecordWithSortedProperties(String a, String b, String c) {
}

A similar test case to the first works fine (not the CSV data is still ordered c,b,a):

    @Test
    void testRecordWithSortedProperties() throws JsonProcessingException {
        CsvMapper csvMapper = new CsvMapper();
        var schema = csvMapper.schemaFor(TestRecordWithSortedProperties.class).withHeader();
        var csv = """
                c,b,a
                C,B,A""";
        TestRecordWithSortedProperties tr = csvMapper.readerFor(TestRecordWithSortedProperties.class).with(schema).readValue(csv);
        Assertions.assertEquals(new TestRecordWithSortedProperties("C", "B", "A"), tr);
    }

Also defining a schema directly will work:

    @Test
    void testRecordWithManualSchema() throws JsonProcessingException {
        CsvMapper csvMapper = new CsvMapper();
        var schema = CsvSchema.builder()
                .addColumn("c")
                .addColumn("b")
                .addColumn("a")
                .build()
                .withHeader();
        var csv = """
                c,b,a
                C,B,A""";
        TestRecord tr = csvMapper.readerFor(TestRecord.class).with(schema).readValue(csv);
        Assertions.assertEquals(new TestRecord("C", "B", "A"), tr);
    }

While an annotated record won't again:

public record AnnotatedTestRecord(@JsonProperty("c") String c, @JsonProperty("b") String b, @JsonProperty("a") String a) {
}

Using @JsonPropertyOrder works well!

import com.fasterxml.jackson.annotation.JsonPropertyOrder;

@JsonPropertyOrder({"c", "b", "a"})
public record TestRecord(String c, String b, String a) {
}

I guess the actual errors lies somewhere in the CSV deserilaizer, since alphabetical property order seems to be the default in jackson. I tried to debug into the whole process but couldn't pin point it.

cowtowncoder commented 3 years ago

Interesting. Yes, sounds incorrect: Order for Records should be considered declaration order even if otherwise default is set to be alphabetic (lexicographic; ordering is case-sensitive).

The root cause is probably related to the fact that while there is no default ordering with Jackson in general, CsvMapper does change default configured to do alphabetic ordering.

I wonder if this could be easily reproduced with JSON ObjectMapper by just enabling MapperFeature.SORT_PROPERTIES_ALPHABETICALLY? If so it'd be easier to reproduce and probably simpler to fix.

alexpartsch commented 3 years ago

@cowtowncoder enabling the property you've mentioned on a JSON ObjectMapper like:

    @Test
    void testJsonOrder() throws JsonProcessingException {
        var om = new ObjectMapper();
        om.configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, true);
        var json = """
                {
                    "c": "C",
                    "b": "B",
                    "a": "A"
                }""";
        var tr = om.readValue(json, TestRecord.class);
        Assertions.assertEquals(new TestRecord("C", "B", "A"), tr);
    }

With the record:

public record TestRecord(String c, String b, String a) {
}

Will not reproduce the issue.

cowtowncoder commented 3 years ago

Ah. No, that should not matter quite the same way since JSON does not need name/column-position mapping. But what I think is that alphabetic ordering should not actually change expected serialization order for Records. Even then, however, mapping should not get confused so that's odd.

One thing to note however: I am not sure why you are both building schema from Record and using withHeader()? Only one is used, and it really has to be header row anyway. So there is not much point building it from record type. I would expect test still pass so it seems there is a bug somewhere but thought I'd ask about usage -- maybe there is some other implied expectation?

cowtowncoder commented 3 years ago

Ok, since I did not recall likely reasons, I checked out CsvSchema, which points that the use case of having schema AND expecting a header is left for a case where:

            When the header line is present and the settings ask for it
            to be processed, two different options are possible:

            a) The schema has been populated.  In this case, build a new
               schema where the order matches the *actual* order in which
               the given CSV file offers its columns, if _schema.reordersColumns()
               is set to true; there cases the consumer of the CSV file
               knows about the columns but not necessarily the order in
               which they are defined.

Not sure if that helps explain anything but at least it confirms one possible reason to use both.

alexpartsch commented 3 years ago

@cowtowncoder running the code without withHeader(), like in this example:

    @Test
    void testRecord() throws JsonProcessingException {
        CsvMapper csvMapper = new CsvMapper();
        var schema = csvMapper.schemaFor(TestRecord.class);
        var csv = """
                c,b,a
                C,B,A""";
        TestRecord tr = csvMapper.readerFor(TestRecord.class).with(schema).readValue(csv);
        Assertions.assertEquals(new TestRecord("C", "B", "A"), tr);
    }

yields in:

expected: <TestRecord[c=C, b=B, a=A]> but was: <TestRecord[c=a, b=b, a=c]>
Expected :TestRecord[c=C, b=B, a=A]
Actual   :TestRecord[c=a, b=b, a=c]

So the header row is assumed as first data row.

cowtowncoder commented 3 years ago

Consuming header row (and since it's there, using too) makes sense. But if so, you probably don't really need to construct schema? Schema is only needed to map column index to/from logical property name. I am not saying there is no bug to fix here, just trying to understand the use case and perhaps suggest a way to work around it for now.

seanf commented 2 years ago

This problem affects the writing of Java records too. I want to define the names and order of my CSV columns in a Java record, but they come out in alphabetical order unless I use @JsonPropertyOrder.

My current workaround looks like this (use at your own risk):

  public static CsvSchema schemaWithHeader(CsvMapper csvMapper, Class<?> recordClass) {
    if (!recordClass.isRecord()) {
      throw new RuntimeException("Must use a Java record class for defined ordering");
    }
    var defaultSchema = csvMapper.schemaFor(recordClass);
    var schema = CsvSchema.builder();
    for (RecordComponent component : recordClass.getRecordComponents()) {
      String columnName = getPropertyName(component);
      var columnType = defaultSchema.column(columnName).getType();
      schema.addColumn(columnName, columnType);
    }
    return schema.build().withHeader();
  }

  private static String getPropertyName(RecordComponent component) {
    String name = component.getName();
    if (component.isAnnotationPresent(JsonProperty.class)) {
      name = component.getAnnotation(JsonProperty.class).value();
    } else if (component.getAccessor().isAnnotationPresent(JsonProperty.class)) {
      name = component.getAccessor().getAnnotation(JsonProperty.class).value();
    }
    return name;
  }
cowtowncoder commented 2 years ago

Yes, CsvMapper enforces stable ordering (alphabetic) for Java objects; and this predates addition of Record type.

I am not quite sure how this should be handled: given that Records do have stable ordering to use, it would seem like that should be the default, even for CSV module. But from implementation and configuration perspective there is no way to really do that at the moment.

pjfanning commented 1 year ago

Quite a few changes have been made in jackson-databind 2.15.0-SNAPSHOT relating to how Records are handled. If anyone wants to try the latest snapshot jars to see if they have changed CsvMapper behaviour, that would be great.

agavrilov76 commented 1 year ago

A possible workaround is to register a custom annotation introspector following the record component order:

  private static class OrderedRecordIntrospector extends JacksonAnnotationIntrospector {

    @Override
    public String[] findSerializationPropertyOrder(final AnnotatedClass ac) {
      return ac.getRawType().isRecord() && !ac.hasAnnotation(JsonPropertyOrder.class)
          ? Arrays.stream(ac.getRawType().getRecordComponents())
              .map(RecordComponent::getName)
              .toArray(String[]::new)
          : super.findSerializationPropertyOrder(ac);
    }
  }
mijoypad commented 1 year ago

Another possible workaround is to use @JsonProperty with index ;), e.g.:

public record TestRecord(
@JsonProperty(index = 0) String c,
@JsonProperty(index = 1) String b, 
@JsonProperty(index = 2) String a) {}