Currently the error aware CSV parser is a mod of the existing Spark CSV parser.
There's a lot of overhead in the CSV parser for dealing with things that Mimir already deals with (e.g., type inference, header detection, etc...)
The spark CSV parser already has some error detection capabilities (see org.apache.spark.sql.execution.datasources.FailureSafeParser). We might be able to leverage some of these as well.
Currently the error aware CSV parser is a mod of the existing Spark CSV parser.
org.apache.spark.sql.execution.datasources.FailureSafeParser
). We might be able to leverage some of these as well.