Closed joelverhagen closed 4 days ago
It looks like a bug in my delimiter detecting code. If you specify the delimiter explicitly Delimiter = ','
it works correctly. It looks like with this input it is detecting the vertical bar |
. My algorithm is rather simple, in that it counts the frequency of candidate delimiters in the first line of the file without considering quoting. Do you actually need the delimiter detector feature, or can you always assume comma?
FWIW, you can access headers before calling Read(). I read the first line internally do determine the headers, so you can use GetName(int ordinal), or GetColumnSchema() if you need to see what columns are in the file before processing.
It looks like a bug in my delimiter detecting code. If you specify the delimiter explicitly
Delimiter = ','
it works correctly. It looks like with this input it is detecting the vertical bar|
. My algorithm is rather simple, in that it counts the frequency of candidate delimiters in the first line of the file without considering quoting. Do you actually need the delimiter detector feature, or can you always assume comma?
Wow, I didn't realize delimiter detection was happening! I would much rather be explicit anyway and set Delimiter = ','
. I'll do that. Thanks for the tip.
FWIW, you can access headers before calling Read(). I read the first line internally do determine the headers, so you can use GetName(int ordinal), or GetColumnSchema() if you need to see what columns are in the file before processing.
My assertion works on the full header line (first line == expected line) so I'd need to join on comma or (more technically correct) serialize the string array to a CSV row to handle escaping. I was starting to go down that route but I think I prefer being lazy and using the explicit delimiter.
I am surprised the default behavior is delimiter auto-detection, instead of opting into that behavior by setting Delimiter = null
or something. No big deal though. I hope others don't run into a wrong detection.
Thanks for your fast response!
Wow, I didn't realize delimiter detection was happening!
My documentation might suck, but that's one thing I actually call out. :P
My assertion works on the full header line
You can get the entire header line (record) by calling GetRawRecordSpan()
after calling Create and before calling Read for the first time. The span will include the line ending character(s) though, so calling ReadLine on the TextReader might be easier.
Anyway, sounds like you got it figured out.
Yup, I'm good to go. Thanks so much!
Hey Mark! Found this on NuGet Insights.
Problem: When the underlying text reader has
ReadLine
called before parsing, a quoted field gets corrupted. I callReadLine
first to validate headers against the expected (loose schema validation).This works for most fields, but when there is a lot of quoting, it seems to cause a problem.
Minimal repro:
Expected output:
Actual output: