bmx-ng / text.mod

Text Utilities
0 stars 3 forks source link

[csv.mod] Last column empty issue #7

Closed thareh closed 1 year ago

thareh commented 1 year ago

Hey,

Using the following CSV-data:

foo;bar
1;

Should in my world produce a row with 2 columns foo=1 and bar=NULL, but that does not seem to be the case - is this an issue with ZSV?

Thanks!

woollybah commented 1 year ago

Interesting, what results do you get?

woollybah commented 1 year ago

Did you remember to set the options delimiter to ; ? - the default is comma. Otherwise, the answer is always 1 column.

thareh commented 1 year ago

The result I get is only 1 column, and yes I did set the delimiter correctly - adding any kind of text to the second column and it works as expected.

Thanks!

woollybah commented 1 year ago

Ah, I see. Does the file end at the end of the second line? (i.e. there's no new-line?) eg.

foo;bar
1;<EOF>

as opposed to

foo;bar
1;
<EOF>

That does appear to give a different result for me too.

thareh commented 1 year ago

Ah yes it does, but the data provided is for demonstration purposes. In the real world data where I stumbled upon this issue the row ends with a line ending and not EOF.

I can dig a bit deeper tomorrow and get back with something more.

Thanks!

woollybah commented 1 year ago

I can make it always return "at least" header.ColumnCount(), if that will help.

woollybah commented 1 year ago

Anyway, you can always ask for the value for column "bar". It will simply return Null if there isn't a column for that row.

GWRon commented 1 year ago

Each TCsvCol has Method ColumnCount:Int() - which should equal to the actually set amount of "filled columns". Columns missing should be seen as nan/null. I would not expect the CSV-module to "fill in" information which the original file did not contain. (What I mean is, that "ColumnCount() should not be somehow returning higher counts than actually found in the csv-data)

Maybe @thareh should provide an "complete" example and what he expects bmx/the module to spit out.

thareh commented 1 year ago

The thing is, I'm comparing the header column count to the row column count to check if the file is valid. But I can't seem to reproduce the issue now - perhaps I was mistaken so I do beg your pardon.

However, I've stumbled upon another issue:

Using the real world data from data.7z the reader seems to parse the last row of the file "full.csv" in the wrong manner. I copied the header and the same line to another file "stripped.csv" and then it works fine. So I'm guessing it's an issue with ZSV that it doesn't reset properly between rows or similar?

Framework BRL.Blitz
Import BRL.FileSystem
Import BRL.StandardIO
Import BRL.StringBuilder
Import Text.CSV

Function Debug:String(row:TCSVRow)
    Local sb:TStringBuilder = New TStringBuilder()

    For Local i:Int = 0 Until row.ColumnCount()
        Local col:SCsvColumn = row.GetColumn(i)
        Local header:TCsvHeader = row.GetHeader()

        sb.AppendLine(header.GetHeader(i) + ": '" + col.GetValue() + "'")
    Next

    Return sb.ToString()
EndFunction

Local opts:TCsvOptions = New TCsvOptions()
opts.delimiter = ";"

Local file:TStream = ReadFile("full.csv")
Local csv:TCsvParser = TCsvParser.Parse(file, opts)

Repeat
    Local status:ECSVStatus = csv.NextRow()

    If status <> ECsvStatus.row
        Exit
    EndIf

    Local row:TCSVRow = csv.GetRow()

    If Not row
        Continue
    EndIf

    Local header:TCsvHeader = row.GetHeader()

    If header And header.ColumnCount() <> row.ColumnCount()
        Local sb:TStringBuilder = New TStringBuilder()

        sb.AppendLine("ERROR: Header column count mismatch")
        sb.AppendLine(header.ColumnCount() + " <> " + row.ColumnCount())
        sb.AppendNewLine()

        For Local i:Int = 0 Until row.ColumnCount()
            sb.AppendLine(row.GetColumn(i).GetValue())
        Next

        Print sb.ToString()
        End
    EndIf

'   Print Debug(row)
Forever

Thanks!

GWRon commented 1 year ago

Change line-endings of full.csv to "CR/LF" (Windows-style) and you won't see the message. Keep it at the used "LF" (Unix-style) and the message is there.

GWRon commented 1 year ago

playing a bit with "full.csv" (removing single characters in entries until it suddenly "works") shows that it is not a specific char bugging out handling.

This sounds to me as if the "line length" is somehow an issue.

GWRon commented 1 year ago

Changing the CSV-File from "UTF-8" to "UTF-7" makes it run when it normally would spit out the warning. So maybe it stumbles over the utf8-encoding and assumes wrong string lengths?

Tried to enforce utf8-reading: Local file:TStream = ReadFile("utf8::stripped.csv") but this errors out with Malformed line terminator

thareh commented 1 year ago

Thank you for your thorough investigation @GWRon!

I tried changing the line-endings to CR/LF and it did get past the previous error, but later in the file the same thing happens. (I truncated the full.csv file you have for convenience)

How did you go about to convert to UTF-7? I'd like to try it out myself.

Thanks!

GWRon commented 1 year ago

I simply changed it after opening it in Geany (the texteditor I use in my Linux Mint xfce).

Yet I think this might all be a red herring. Just stuff leading to a faulty piece of code to fail - without properly indicating which piece of code it is.