Open asfimport opened 3 years ago
Antoine Pitrou / @pitrou: There isn't a parameter for this. It would probably be doable to add one, but would add non-trivial complexity to the CSV reader, so I'm rather reluctant. Which source is the data coming from?
Nithin Kumara Narayanaswamy Teekaramanaa: Hi Antoine,
In our case the source is a snapshot of a db saved as csv.
Antoine Pitrou / @pitrou: Do you use a built-in database function? Does it have options to customize the CSV format?
Nithin Kumara Narayanaswamy Teekaramanaa: This is not possible as source csv files are from an another system. But in principle does it not make sense that it writes null values in place if the data is missing provided the schema is specified?
Antoine Pitrou / @pitrou: It may as well be an error in the system producing the CSV files. How do we know? Generally, it's not a good idea to let errors pass silently.
In any case, as I said, this would add complication in the core of the CSV reader, which is why it hasn't been done (yet?).
Test scenario :
I read the same attched csv file in pandas and pyarrow to make a comparison,
With pandas it reads it into a df without problems and result is as follows:
2. With pyarrow csv, I get a parse error:
Is there a parameter that can be set to fill null values in case the column values are missing for the specified schema?
Reporter: Nithin Kumara Narayanaswamy Teekaramanaa
Related issues:
Original Issue Attachments:
PRs and other links:
Note: This issue was originally created as ARROW-12001. Please see the migration documentation for further details.