FasterXML / jackson-dataformats-text

Uber-project for (some) standard Jackson textual format backends: csv, properties, yaml (xml to be added in future)
Apache License 2.0
408 stars 149 forks source link

Use `CsvSchema.ColumnType.NUMBER` and such to coerce field values from plain `String`s #167

Open zenglian opened 4 years ago

zenglian commented 4 years ago

Current: https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv

File csvFile = new File(fileName);
CsvMapper mapper = new CsvMapper();
CsvSchema schema =...
MappingIterator<Map<String,String>> it = mapper.readerFor(Map.class)
   .with(schema)
   .readValues(csvFile);
while (it.hasNext()) {
  Map<String,String> rowAsMap = it.next();
  // access by column name, as defined in the header row...
}

Desired:

MappingIterator<Map<String,Object>> it = mapper.readerFor(Map.class)
   .with(schema)
   .readValues(csvFile);

e.g., if the 1st column specified in schema is CsvSchema.ColumnType.NUMBER(can it be integer and floating), then the 1st column in the map should be number instead of string.

cowtowncoder commented 4 years ago

I am not sure I understand. What exactly is your issue here? Signature that readValues() returns?

zenglian commented 4 years ago

I am not sure I understand. What exactly is your issue here? Signature that readValues() returns?

updated

cowtowncoder commented 4 years ago

Unfortunately there is currently no support for automatic coercion of types based on Schema: all conversions require target type (POJO properties usually) to work.

Originally there was a plan to handle coercion, however, so this is a reasonable improvement request.

alturkovic commented 10 months ago

I am also having issues with this. Using a schema like:

CsvSchema.builder()
            .addColumn("name")
            .addNumberColumn("age")
            .addColumn("title")
            .build()

I would expect this csv: "John Doe",25,Mr. to be parsed as: { "name": "John Doe", "age": 25, "title": "Mr." }

But age gets parsed as a String.

cowtowncoder commented 10 months ago

I do agree that ideally token stream would reflect intended type, but that is not currently implemented.

@alturkovic It depends: if target type is, say int, then "age" will be coerced appropriately even if it is exposed as JsonToken.VALUE_STRING.

So how are you reading content for it to be "parsed as a String"?

alturkovic commented 10 months ago

Here is the code snippet that I used:

val csv = """"John Doe",25,Mr."""

val schema = CsvSchema.builder()
    .addColumn("name")
    .addNumberColumn("age")
    .addColumn("title")
    .build()

val nodes = csvMapper.readerFor(JsonNode::class.java)
    .with(schema)
    .readValues<JsonNode>(csv)
    .readAll()

// if there is only one line in the csv, parse it as an object, otherwise as an array 
val json = if (nodes.size == 1) nodes.first().toString() else nodes.toString()

But the resulting json looks like:

{
  "name": "John Doe",
  "age": "25",
  "title": "Mr."
}

And I would expect age to be an int instead of a String.

cowtowncoder commented 10 months ago

Thank you @alturkovic .

Ok, yes. JsonNode relies on underlying token type, so that makes sense. And I agree that ideally token types would use given type coercion information for Numbers, Booleans and nulls.

alturkovic commented 2 months ago

Any updates?

cowtowncoder commented 2 months ago

No updates. PRs welcome.