JoshClose / CsvHelper

Library to help reading and writing CSV files
http://joshclose.github.io/CsvHelper/
Other
4.7k stars 1.06k forks source link

An error occurred reading the record #149

Closed Tommixoft closed 11 years ago

Tommixoft commented 11 years ago

Hi, thanks for this nice lib. Anyways i have problem. I'm very new to C# maybe i do something wrong.

i have csv file, separated by tabs. NO header. i have custom class (example):

public class MyClass
   {
    [CsvField(Index = 0)]
    public string Barcode { get; set; }

    [CsvField(Index = 1)]
    [TypeConverter(typeof(double))]
    public double Qntity { get; set; }

}

i read it like this:

        var csv = new CsvReader(new StreamReader(DataFile), config);
        ExternalData = csv.GetRecords<MyClass>().ToList();

But i always get error.

An error occurred reading the record.

Row: '1' (1 based)
Type: 'CustomPreview.MyClass'
Field Index: '1' (0 based)
Field Value: '0.00'

i always get errors if i use any type except string, i make my class all string, without using converter - all ok. but it would be great to get values as they suppose to be, not only as string.

what i'm doing wrong?

JoshClose commented 11 years ago

What do your config settings look like?

Tommixoft commented 11 years ago

thanks for replay.

 CsvConfiguration config = new CsvConfiguration();

            if (Properties.Settings.Default.DataFileDelimiter.ToString().ToLower() == "t")
                config.Delimiter = "\t";
            else
                config.Delimiter = Properties.Settings.Default.DataFileDelimiter.ToString();

            config.Encoding = Encoding.UTF8;
            config.HasHeaderRecord = HasHeader; //in this case it's FALSE
            config.IsStrictMode = false;
            config.IsCaseSensitive = false;
            config.QuoteNoFields = true;
Tommixoft commented 11 years ago

PS i did not create any typeConverters, maybe i should? or there is basic ones? like int, double, decimal... etc?

JoshClose commented 11 years ago

Can you post some sample data here that fails?

Tommixoft commented 11 years ago

Here real file's line: (tab separated) utf-8

4779034460079 Skystas skalbiklis universalus "ARLI clean" 1 kg. vnt UAB 'ARLI BALTIC' 21.00 Bazė A B C D 6.19 1.79 kg 6.19 0

Whole file: http://tommixoft.com/testas.csv

Tommixoft commented 11 years ago

of course the code i gave you earlier is not what i use for this file (line) (it is but only fraction of class)

Tommixoft commented 11 years ago

the real class: (works cause every field is set as string.

public class xxx
{
    [CsvField(Index = 0)]
    public string Barkodas { get; set; }

    [CsvField(Index = 1)]
    public string Pavadinimas { get; set; }

    [CsvField(Index = 2)]
    public string MatVnt { get; set; }

    [CsvField(Index = 3)]
    public string Tiekejas { get; set; }

    [CsvField(Index = 4)]
 //    [TypeConverter(typeof(double))]
    public string PVM { get; set; }

    [CsvField(Index = 5)]
    public string Baze { get; set; }

    [CsvField(Index = 6)]
    public string A { get; set; }

    [CsvField(Index = 7)]
    public string B { get; set; }

    [CsvField(Index = 8)]
    public string C { get; set; }

    [CsvField(Index = 9)]
    public string D { get; set; }

    [CsvField(Index = 10)]
    public string TarosBarkodas { get; set; }

    [CsvField(Index = 11)]
//     [TypeConverter(typeof(double))]
    public string KainaLt { get; set; }

    [CsvField(Index = 12)]
//     [TypeConverter(typeof(double))]
    public string KainaEur { get; set; }

    [CsvField(Index = 13)]
    public string KilmesSalis { get; set; }

    [CsvField(Index = 14)]
    public string BazinisMatVnt { get; set; }

    [CsvField(Index = 15)]
 //    [TypeConverter(typeof(double))]
    public string BazVntKainaLt { get; set; }

    [CsvField(Index = 16)]
 //   [CsvHelper.TypeConversion.TypeConverter(typeof(int))]
    public string Buteliu { get; set; }
}
JoshClose commented 11 years ago

This worked just fine for me:

using System.IO;
using System.Linq;
using System.Text;
using CsvHelper;
using CsvHelper.Configuration;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            CsvConfiguration config = new CsvConfiguration();
            config.Delimiter = "\t";
            config.Encoding = Encoding.UTF8;
            config.HasHeaderRecord = false;
            config.IsStrictMode = false;
            config.IsCaseSensitive = false;
            config.QuoteNoFields = true;
            using (var stream = new MemoryStream() )
            using (var reader = new StreamReader( stream ) )
            using (var writer = new StreamWriter( stream, Encoding.UTF8 ) )
            using (var csv = new CsvReader(reader, config))
            {
                writer.Write("4779034460079 Skystas skalbiklis universalus \"ARLI clean\" 1 kg. vnt UAB 'ARLI BALTIC'   21.00   Bazė   A   B   C   D       6.19    1.79        kg  6.19    0");
                writer.Flush();
                stream.Position = 0;

                var records = csv.GetRecords<Test>().ToList();
            }
        }

        public class Test
        {
            [CsvField(Index = 0)]
            public string Barkodas { get; set; }

            [CsvField(Index = 1)]
            public string Pavadinimas { get; set; }

            [CsvField(Index = 2)]
            public string MatVnt { get; set; }

            [CsvField(Index = 3)]
            public string Tiekejas { get; set; }

            [CsvField(Index = 4)]
            public double PVM { get; set; }

            [CsvField(Index = 5)]
            public string Baze { get; set; }

            [CsvField(Index = 6)]
            public string A { get; set; }

            [CsvField(Index = 7)]
            public string B { get; set; }

            [CsvField(Index = 8)]
            public string C { get; set; }

            [CsvField(Index = 9)]
            public string D { get; set; }

            [CsvField(Index = 10)]
            public string TarosBarkodas { get; set; }

            [CsvField(Index = 11)]
            public double KainaLt { get; set; }

            [CsvField(Index = 12)]
            public double KainaEur { get; set; }

            [CsvField(Index = 13)]
            public string KilmesSalis { get; set; }

            [CsvField(Index = 14)]
            public string BazinisMatVnt { get; set; }

            [CsvField(Index = 15)]
            public double BazVntKainaLt { get; set; }

            [CsvField(Index = 16)]
            public int Buteliu { get; set; }
        }
    }
}
Tommixoft commented 11 years ago

Well maybe works for you cause you not reading file.... DOn't know..still doesn't work... code looks almost exactly as yours, mine just reads file into streamreader

JoshClose commented 11 years ago

Can you post the file somewhere that I can download it? You can email me the link if you don't want it public.

Tommixoft commented 11 years ago

i already added link in that coment where i posted line. THANKS!

http://tommixoft.com/testas.csv

JoshClose commented 11 years ago

Oh sorry! I complete missed that. I'll use that file.

Tommixoft commented 11 years ago

i really hope this is some king of strange bug, cause it would be shame for me :) Such simple code and i stuck :)

Tommixoft commented 11 years ago

My code:

          int datatype = Properties.Settings.Default.DataFileType;

            CsvConfiguration config = new CsvConfiguration();

            if (Properties.Settings.Default.DataFileDelimiter.ToString().ToLower() == "t")
                config.Delimiter = "\t";
            else
                config.Delimiter = Properties.Settings.Default.DataFileDelimiter.ToString();

            config.Encoding = Encoding.UTF8;
            config.HasHeaderRecord = HasHeader;
            config.IsStrictMode = false;
            config.IsCaseSensitive = false;
            config.QuoteNoFields = true;

            switch (datatype)
            {
                case 0: 
                    {
                        var csv = new CsvReader(new StreamReader(DataFile, Encoding.UTF8), config);
                        ExternalData = csv.GetRecords<MyClassNAME>().ToList();
                        csv.Dispose();
                        label4.Text = "Finished...";
                        label4.Refresh();
                        break;
                    }

ExternalData is my 'global' variable. private IEnumerable ExternalData;

JoshClose commented 11 years ago

The exception message show this:

An error occurred reading the record.

Row: '4305' (1 based)
Type: 'ConsoleApplication1.Program+Test'
Field Index: '4' (0 based)
Field Value: 'A'

Based on this, it's trying to convert "A" into a double for use with the PVM property.

Row 4305 has this:

4751004282294 "Atlantinių'' lašišų gabaliukai su pomidorų padažu 230g vnt UAB Norvelita 21.00 Bazė A B C D 4.19 1.21 kg 18.22 0

This means it's not being parsed as expected. The reason is the second column starts with a ". In a CSV file, if a field has a quote or delimiter in it, the whole field needs a quote around it.

I can see by the tabs that it should be:

0: 4751004282294 1: "Atlantinių'' lašišų gabaliukai su pomidorų padažu 230g 2: vnt 3: UAB Norvelita 4: 21.00 5: Bazė 6: A

What the parser is seeing is:

0: 4751004282324 1: Atlantinių'' lašišų gabaliukai savo sultyse 230g\tvnt\tUAB Norvelita\t21.00\tBazė\tA\tB\tC\tD\t\t4.19\t1.21\t\tkg\t18.22\t0\r\n5901653530164\tŠprotai rūkyti 250g \tvnt\tUAB Lupra\" 2: 21.00 3: Bazė 4: A

So, what is happening is the parser is going until the next quote " that isn't doubled "".

This is an invalid CSV file. Are you exporting it from somewhere? If so, maybe there are some options that can be set to change some of this.

If quotes are never used as quotes, you could change the quote char to some obscure char value that would never be in the file, and that might work. But, if there is a tab in the file, then we have the same problem.

Tommixoft commented 11 years ago

thanks for replay. well quotes should NOT be seen as quotes, they use all possible quotes, there is no any rules, as you can see there is ALL quotes possible in records, i need to ignore them. i need simple splitter who fast splits strings via delimiter. can you make option to ignore quotes?

thanks.

Tommixoft commented 11 years ago

for me it shows error in first row, first row is good :) but shows error. do you somehow get passed the first row? :)

JoshClose commented 11 years ago

This is the code I used.

using System.IO;
using System.Linq;
using System.Text;
using CsvHelper;
using CsvHelper.Configuration;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            CsvConfiguration config = new CsvConfiguration();
            config.Delimiter = "\t";
            config.Encoding = Encoding.UTF8;
            config.HasHeaderRecord = false;
            config.IsStrictMode = false;
            config.IsCaseSensitive = false;
            config.QuoteNoFields = true;
            using (var stream = File.OpenRead("testas.csv"))
            using (var reader = new StreamReader( stream, Encoding.UTF8 ))
            using (var csv = new CsvReader(reader, config))
            {
                while (csv.Read())
                {
                    var record = csv.GetRecord<Test>();
                }
            }
        }

        public class Test
        {
            [CsvField(Index = 0)]
            public string Barkodas { get; set; }

            [CsvField(Index = 1)]
            public string Pavadinimas { get; set; }

            [CsvField(Index = 2)]
            public string MatVnt { get; set; }

            [CsvField(Index = 3)]
            public string Tiekejas { get; set; }

            [CsvField(Index = 4)]
            public double PVM { get; set; }

            [CsvField(Index = 5)]
            public string Baze { get; set; }

            [CsvField(Index = 6)]
            public string A { get; set; }

            [CsvField(Index = 7)]
            public string B { get; set; }

            [CsvField(Index = 8)]
            public string C { get; set; }

            [CsvField(Index = 9)]
            public string D { get; set; }

            [CsvField(Index = 10)]
            public string TarosBarkodas { get; set; }

            [CsvField(Index = 11)]
            public double KainaLt { get; set; }

            [CsvField(Index = 12)]
            public double KainaEur { get; set; }

            [CsvField(Index = 13)]
            public string KilmesSalis { get; set; }

            [CsvField(Index = 14)]
            public string BazinisMatVnt { get; set; }

            [CsvField(Index = 15)]
            public double BazVntKainaLt { get; set; }

            [CsvField(Index = 16)]
            public int Buteliu { get; set; }
        }
    }
}
JoshClose commented 11 years ago

There currently is no option of ingoring quotes completely on reading. You could set the quote char to something obscure that your file would never use. Like a smiley face or something.

Tommixoft commented 11 years ago

sorry but im not right now with my VisualStudio, what property allows me to set custom quote char?

and REALLY thanks for help! i appreciate you spending your personal life for such idiots like me :D

JoshClose commented 11 years ago

Configuration.Quote

Tommixoft commented 11 years ago

thanks! but adding ability to ignore quotes would speed up parsing even more, for people who don't need this :) Thanks again for help and this lib.

JoshClose commented 11 years ago

Adding the ability to ignore quotes would probably slow it down some since I would also have to do a check when a quote char is found to check if it should be ignored.

What happens if the field contains a delimeter; in your case, a tab? It wouldn't parse correctly then either. If you change all the tabs to commas and open in excel, excel has the same behavior on the same line.

I could add this feature, but I really don't want to for a few reasons.

  1. It will slow down the parser.
  2. It's not valid CSV format.
  3. Excel can't interpret it. CsvHelper has the ability to parse some pretty bad files, but it's Excel compatible.

I will have to think about it. I'm pretty sure it would be simple to add, I'm just not sure I want to at this time.

Tommixoft commented 11 years ago

anyways the same error i'm getting even with your code and my crazy quote char. maybe this is because of regional settings...Have to check it.

Anyways, i was thinking that if could check the config to ignore quotes and skip some part of code, not to check if quotes needed every column :)

Anyways, i agree that this file is crap, but well this is what client gives. this is i guess some export from their ERP application.

Thank you again.

Tommixoft commented 11 years ago

ALSO it works if i use in class all strings so why it works with such crazy quotes (the one sin file) in string mode? Does it does some additional validation ? and error im getting is always in good spot. i declare duoble it shows Value: 0.00 (well this is 0 but still double :D )

JoshClose commented 11 years ago

If you use all strings, you'll have the case where one of the properties will contain several fields of data in it. It won't fail because it's not trying to convert Atlantinių'' lašišų gabaliukai savo sultyse 230g\tvnt\tUAB Norvelita\t21.00\tBazė\tA\tB\tC\tD\t\t4.19\t1.21\t\tkg\t18.22\t0\r\n5901653530164\tŠprotai rūkyti 250g \tvnt\tUAB Lupra\" into a double.

The parser doesn't check if quotes are needed for every column. If the column has a quote as the first character, then it will see it as a quoted field. If it's not a quote, then it can have quotes later in the field and it won't matter.

I can try adding that config setting to ignore quotes later tonight and see if that will even parse the file correctly. The problem is, if any of the fields contain a delimeter, you'll run into this issue also. If the data is text, it's fairly likely that it may contain a tab in it.

Maybe you can ask your client if they have any options with exporting, because it's creating an invalid CSV file.

Tommixoft commented 11 years ago

but application fails at value 0.00 or 21.00 not that awful string. No in my country clients do not follow standards everybody have their own :D even barcodes...uses them as they want :)

Well if text contains the delimiter -then client will have to workout how to avoid that but when product names have various quotes-they won't change anything..as you see there is many products..they won't change names just to avoid quotes.. :(

I really appreciate you help. Don't add functions just because of one case..if you think this will be valuable addition to your lib - i will be happy :)

For now i will have to use .NET's split function to parse data..hate that :)

JoshClose commented 11 years ago

Well, if splitting on tabs works perfectly, then there probably isn't a tab in the text of the field anywhere. Otherwise you'll run into the same issue when splitting.

Can you ask them what their exporter does if the field contains a tab?

Tommixoft commented 11 years ago

I did not tried split for this file yet. But did you find line where tab is not in his place? Cause your lib does count all lines correctly (36566 lines) so file does not have more then needed tabs.

i think their exporter doesn't give a damn. it takes all records adds tab between then and that's it :) Also i don't think that they will know answer. As you can see from their file - exporter exports anything :D Validation is not important :D

PS. tested the file, god damn it's invalid! there is lines where from 17 columns only 3 or 1 exists (no separators). But the good part that your lib still reads such crappy file. Still you saved me a lot of time :)

chriskeeble commented 11 years ago

Hi Josh,

Also looking for this feature - i.e. a configuration option to treat quotes (ASCII 34) as content rather than a text delimiter. I'm aware that this will mean a tab-delimited file can't contain tabs in any field values, but that's ok.

Currently using ASCII 127 (delete) as the Configuration.Quote value appears to work (hopefully without any unwanted side effects).

Did you do decide whether you can include this option?

Many thanks

Chris

Tommixoft commented 11 years ago

you can set in configuration other quote mark, my problem wasn't because of quote :)

JoshClose commented 11 years ago

@chriskeeble so you would like a config option for IgnoreQuotes or something like that? So basically the parser would pretend to know nothing about quotes?

chriskeeble commented 11 years ago

@JoshClose Yes, that would be great if you're able.

JoshClose commented 11 years ago

I will add it to my list.

JoshClose commented 11 years ago

Configuration.IgnoreQuotes has been added and will be a part of the 2.0 release.

chriskeeble commented 11 years ago

@JoshClose Many thanks Josh, looking forward to the new release.

JoshClose commented 11 years ago

I was going to release it last night, but had a couple questions come in about possible bugs, so I wanted to make sure there weren't any known bugs.

I'll hopefully get it pushed out tonight.