digital-preservation / csv-validator

CSV Validation Tool and API (CSV Schema RI)
http://digital-preservation.github.io/csv-validator
Mozilla Public License 2.0
202 stars 54 forks source link

Progress bar info is in-correct when there are line-breaks within cell #142

Open kenhosr opened 7 years ago

kenhosr commented 7 years ago

Git commit version: 3be31fbf48fd5d34a617e067dce9c48d1a43908b Use attached file to reproduce.

The csv file contains 1 column, with 2 records in total, besides the header row. There are line breaks within cell.

Use CSV Validator UI, set the csv and csvs file, then validate

Expect: The progress bar should be in complete status after the validation process completed successfully.

Actual: Line 3 of 5

bar

Archive.zip

DavidUnderdown commented 7 years ago

Thanks, we'll look into this when we revisit #141 - as mentioned there generating the line count can take a significant amount of time with large CV files, and ideally however we deal with that will properly take account of line breaks within a row which should also resolve this issue.

kenhosr commented 7 years ago

if I update below 2 methods, the progress bar will display correctly for the above case. But error log still need to be fixed.

Wondering why when we read Row to validate, CSVReader is used; while count total rows, JLineReader is used.

protected def countRows(reader: JReader, schema: Schema): Int = {
    //    val rowsAsHeader = if (schema.globalDirectives.contains(NoHeader())) 0 else 1
    //    Try {
    //      val lineReader = new JLineNumberReader(reader) // don't close this JLineNumberReader, because it automatically close original reader.
    //                                                     // It's resource/memory safe. JLineReader will be colected by GS.
    //      @tailrec
    //      def readAll(): Int = {
    //        val result = Option(lineReader.readLine())
    //        if(result.empty) {
    //          lineReader.getLineNumber() + 1 //start from 1 not 0
    //        } else {
    //          readAll()
    //        }
    //      }
    //      readAll()
    //    }.map(_ - rowsAsHeader) getOrElse -1

    val csvReader = new CSVReader(reader)
    val result = Option(csvReader.readAll().size())

    result.getOrElse(-1)
  }

as well as

class RowIterator(reader: CSVReader, progress: Option[ProgressFor]) extends Iterator[Row] {

  private var index = 0
  private var current = toRow(Option(reader.readNext()))

rows