Support parsing CSV with header regardless of unknown columns

FasterXML / jackson-dataformats-text

Uber-project for (some) standard Jackson textual format backends: csv, properties, yaml (xml to be added in future)

Apache License 2.0

404 stars 148 forks source link

Support parsing CSV with header regardless of unknown columns #286

Open bjmi opened 3 years ago

bjmi commented 3 years ago

When reading given CSV with jackson-dataformat-csv 2.11.4

name,weight,age
Roger,69,27
Chris,89,53

using following snippet

CsvMapper csvMapper = new CsvMapper();
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true)
        .addColumn("name").addColumn("age").build();
List<Person> persons = csvMapper
        .readerFor(Person.class)
        .with(csvSchema)
        .<Person> readValues(csv)
        .readAll();
...
class Person {
    public String name;
    public int age;
}

a CsvMappingException is thrown (Too many entries: expected at most 2) because the column weight is not known to CsvSchema. csvMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); still leads to the same CsvMappingException. Thus please introduce a new CsvParser feature e.g. IGNORE_UNKNOWN_COLUMNS (disabled by default) that allows reading CSV regardless of unknown columns.

kpankowski commented 3 years ago

Reorder the columns:

CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true) .addColumn("name").addColumn("age").build();

or skip adding columns explicitly when using setUseHeader(true)

CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();

bjmi commented 3 years ago

Reorder the columns:

CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).setReorderColumns(true) .addColumn("name").addColumn("age").build();

But the use case expects the columns name and age in given order and should fail otherwise. At the moment explicitly declaring header columns and the reorder column feature are mutually exclusive due to this: https://github.com/FasterXML/jackson-dataformats-text/blob/810772312735f1fb89d6fa37dd70e150e9cc783b/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvParser.java#L787 and can be considered as a bug.

or skip adding columns explicitly when using setUseHeader(true) CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();

But then FAIL_ON_MISSING_COLUMNS feature can't be used anymore and name and age aren't required columns anymore.

ZijiePan1996 commented 1 year ago

Same issue was encountered with jackson-dataformat-csv 2.13.4, trying to parse a csv file(>100 columns) to a Java entity(10 attributes). I have tried to use

ObjectReader csvReader = csvMapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES) .readerFor(BlackList.class) .with(csvSchema);

But I have found that the values in the unknown columns are parsed to the next column, messed up data in the DB. As @bjmi mentioned, IGNORE_UNKNOWN_PROPERTIES will likely solve my problem

a CsvMappingException is thrown (Too many entries: expected at most 2) because the column weight is not known to CsvSchema. csvMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); still leads to the same CsvMappingException. Thus please introduce a new CsvParser feature e.g. IGNORE_UNKNOWN_COLUMNS (disabled by default) that allows reading CSV regardless of unknown columns.

redvasily commented 1 year ago

I can get it to work if when reading I use a schema .withHeader() and .withColumnReordering().

FAIL_ON_UNKNOWN_PROPERTIES is disabled for me, but I didn't test if it's necessary.

So in the end I am using two different schemas: for writing without column reordering and for reading with column reordering.