FasterXML / jackson-dataformats-text

Uber-project for (some) standard Jackson textual format backends: csv, properties, yaml (xml to be added in future)
Apache License 2.0
407 stars 149 forks source link

Found bug in CSV Mapper when we user PolyMorphism #202

Open loverzpark opened 4 years ago

loverzpark commented 4 years ago

Please find the details of the bug:

https://stackoverflow.com/questions/62235943/csvmapper-unable-to-parse-csv-with-abstract-class-mapping

Exception in thread "main" com.fasterxml.jackson.dataformat.csv.CsvMappingException: Too many entries: expected at most 1 (value #1 (8 chars) "BILLER_1") at [Source: (com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader); line: 1, column: 5] at com.fasterxml.jackson.dataformat.csv.CsvMappingException.from(CsvMappingException.java:28) at com.fasterxml.jackson.dataformat.csv.CsvParser._reportCsvMappingError(CsvParser.java:1246) at com.fasterxml.jackson.dataformat.csv.CsvParser._handleExtraColumn(CsvParser.java:1001) at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntry(CsvParser.java:862) at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:609) at com.fasterxml.jackson.core.util.JsonParserSequence.switchAndReturnNext(JsonParserSequence.java:234) at com.fasterxml.jackson.core.util.JsonParserSequence.nextToken(JsonParserSequence.java:152) at com.fasterxml.jackson.core.JsonParser.nextFieldName(JsonParser.java:861) at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:295) at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:189) at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:161) at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:130) at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:97) at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeWithType(BeanDeserializerBase.java:1196) at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:68) at com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:280) at com.fasterxml.jackson.databind.MappingIterator.readAll(MappingIterator.java:320) at com.fasterxml.jackson.databind.MappingIterator.readAll(MappingIterator.java:306)

loverzpark commented 4 years ago

Possible issue I could figure-out is number of schema Columns,

In our Abstract Class, there is no column is defined, while it goes to read first row, it sets type fine, but when it tries to move to next column read, it founds wrong index (total column are 1, and current index is 1 and move to 2..3..4 and so on which should not happen, this leads to move the code flow for extraAnyProperty code)

@cowtowncoder plz take a look

cowtowncoder commented 4 years ago

Code from SO; types (seems to use... Lombok?):

@Getter
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, 
property = "type", visible = true, 
include = JsonTypeInfo.As.PROPERTY)
@JsonSubTypes({
        @Type(value = FHR.class, name = "FHR"),
        @Type(value = BHR.class, name = "BHR")})
public class PaymentBatchRecord {
    protected String type;
}

@Getter
@Setter
@JsonPropertyOrder({
//        "type",
        "transmit_id",
        "password",
        "creation_date",
        "creation_time",
        "file_format_code",
        "file_reference_code"
})
class FHR extends PaymentBatchRecord implements Serializable {
    private final static long serialVersionUID = -584359005702082280L;
    //    @JsonProperty("type")
//    private String type;
    @JsonProperty("transmit_id")
    private String transmitId;
    @JsonProperty("password")
    private String password;
    @JsonProperty("creation_date")
    private String creationDate;
    @JsonProperty("creation_time")
    private String creationTime;
    @JsonProperty("file_format_code")
    private String fileFormatCode;
    @JsonProperty("file_reference_code")
    private String fileReferenceCode;
}

@Setter
@Getter
@JsonPropertyOrder({
//        "type",
        "transaction_type",
        "merchant_id",
        "merchant_name",
        "batch_entry_description",
        "batch_reference_code",
        "batch_number"
})
class BHR extends PaymentBatchRecord implements Serializable {
    private final static long serialVersionUID = 1650905882208990490L;
    //    @JsonProperty("type")
//    private String type;
    @JsonProperty("transaction_type")
    private String transactionType;
    @JsonProperty("merchant_id")
    private String merchantId;
    @JsonProperty("merchant_name")
    private String merchantName;
    @JsonProperty("batch_entry_description")
    private String batchEntryDescription;
    @JsonProperty("batch_reference_code")
    private String batchReferenceCode;
    @JsonProperty("batch_number")
    private Integer batchNumber;
}

Sample CSV input:

FHR,BILLER_1,"biller1pwd","20200224","091503","CSV","202002240915031"
BHR,"PMT","BILLER_1","BILLER 1 NAME","UTILITY BILL",,1

and code

        CsvMapper mapper = new CsvMapper();
        // uncomment it to run but with all the null values
// mapper.enable(CsvParser.Feature.IGNORE_TRAILING_UNMAPPABLE)
        ;
        CsvSchema sclema = mapper.schemaFor(PaymentBatchRecord.class)
                .withoutHeader();

        MappingIterator<PaymentBatchRecord> iterator = mapper
                .readerFor(PaymentBatchRecord.class)
                .with(sclema)
                .readValues(in);

        List<PaymentBatchRecord> ppojos = iterator.readAll();
loverzpark commented 4 years ago

@cowtowncoder I tested without lombok annotations.

Same code will output correctly with json, but it won't work with csv. Inheritance is the issue and it can't read columns of subclasses But same doesn't happen with json and get correct read/write operations.

Ignore unknown true will start output but it'll be empty object

cowtowncoder commented 4 years ago

Right, this is a limitation of CSV module: schema generation will simply find all properties that specific class has -- and if the base type only has some, it will not work for subtypes that have other properties. You will instead need to either build CsvSchema manually, or use a sub-class that has all properties any type would have. An additional challenge is that only some forms of @JsonTypeInfo can work; EXISTING_PROPERTY and PROPERTY do, others not.

cowtowncoder commented 4 years ago

Maybe I should expand on my previous note a bit.

So, CSV module has to use 2 pieces of information, whereas JSON works with just one:

  1. Introspected properties of POJO (used by jackson-databind, all backends). This is format-agnostics
  2. Mapping of column indexes to logical names -- something CSV requires since it is tabular format, instead of records of name/value pairs (like JSON)

The problem here is that for (2), introspection only looks at properties found on class you give: if it is a base class, it will not have properties of (all of) the subclass(es), and as a result, attempt to read or write a sub-class instance will run into problems. Same may also be problematic for Type Id property, depending on how @JsonTypeInfo is used.

Given this, polymorphic handling will be difficult with CSV, and I suspect that currently it would only work for cases where all subtypes have same fields; or if there is one specific sub-class that has all properties of all subtypes. If so, you could generate CsvSchema using that sub-class, even if using base type as target class for readValue() call (or for ObjectReader creation).

This is something that would be good to improve on in future, but may be challenging to implement -- beyond question of finding subtypes (which is doable, in general at least) there is the challenge of figuring out full ordering for all properties for all subtypes. The challenge with ordering is that it must be universal across all subtypes.