Open asfimport opened 4 years ago
Tim Lantz: Re: my side note above, I filed https://issues.apache.org/jira/browse/ARROW-7656 as well. I see that in ARROW-6536 there is discussion on why in the C++ API you need to set both and that makes perfect sense so this is just a documentation thing.
Joris Van den Bossche / @jorisvandenbossche:
Currently, I think the column_types
option is only meant to specify the types, while nullability is part of the Field in a Schema, and is not a fundamental property of the type itself.
Originally mentioned in: https://github.com/apache/arrow/issues/6243
High level description of the issue:
Minimal reproduction case:
Use case notes: this is especially noticeable when using pyarrow as a meant to save data with a known schema to parquet as the ParquetWriter will check that the schema of a table being written matches the schema supplied to the writer. If that same schema is used to to read the CSV data and contains a nullable field, a mismatch will be detected resulting in an error which is demonstrated below.
Potential source of issue:
Environment: Reproduced on Ubuntu 18.04 and OSX Catalina in Python 3.7.4. Reporter: Tim Lantz
Note: This issue was originally created as ARROW-7655. Please see the migration documentation for further details.