Open kenhosr opened 7 years ago
Thanks, we'll look into this when we revisit #141 - as mentioned there generating the line count can take a significant amount of time with large CV files, and ideally however we deal with that will properly take account of line breaks within a row which should also resolve this issue.
if I update below 2 methods, the progress bar will display correctly for the above case. But error log still need to be fixed.
Wondering why when we read Row to validate, CSVReader is used; while count total rows, JLineReader is used.
protected def countRows(reader: JReader, schema: Schema): Int = {
// val rowsAsHeader = if (schema.globalDirectives.contains(NoHeader())) 0 else 1
// Try {
// val lineReader = new JLineNumberReader(reader) // don't close this JLineNumberReader, because it automatically close original reader.
// // It's resource/memory safe. JLineReader will be colected by GS.
// @tailrec
// def readAll(): Int = {
// val result = Option(lineReader.readLine())
// if(result.empty) {
// lineReader.getLineNumber() + 1 //start from 1 not 0
// } else {
// readAll()
// }
// }
// readAll()
// }.map(_ - rowsAsHeader) getOrElse -1
val csvReader = new CSVReader(reader)
val result = Option(csvReader.readAll().size())
result.getOrElse(-1)
}
as well as
class RowIterator(reader: CSVReader, progress: Option[ProgressFor]) extends Iterator[Row] {
private var index = 0
private var current = toRow(Option(reader.readNext()))
Git commit version: 3be31fbf48fd5d34a617e067dce9c48d1a43908b Use attached file to reproduce.
The csv file contains 1 column, with 2 records in total, besides the header row. There are line breaks within cell.
Use CSV Validator UI, set the csv and csvs file, then validate
Expect: The progress bar should be in complete status after the validation process completed successfully.
Actual: Line 3 of 5
Archive.zip