JoshClose / CsvHelper

Library to help reading and writing CSV files
http://joshclose.github.io/CsvHelper/
Other
4.75k stars 1.06k forks source link

Variadic number of columns #1042

Closed alancapc closed 6 years ago

alancapc commented 6 years ago

How to create a class with a variadic number of properties?

In other words, how to map a .csv file like below, having a variadic or not always the same number of headers?

keys,v1,v2..vn
k1,value11,value12..value1m
k2,value21,value22..value2m
...
kn,valuen1,valuen2..valuenm

I did find the following code snippet from the author:

public class CsvRecord
{
    public string[] Fields { get; set; }
}

var list records = new List<CsvRecord>();
while( csv.Read() )
{
    var record = new CsvRecord();
    record.Fields = csv.CurrentRecord;
}

Is it still valid given it is from 2013 or does the library contain a new feature that does it better?

Apologies if I am missing a fundamental concept of this package.

polterguy commented 6 years ago

For the record, if somebody sees this, I think you're supposed to use the CsvParser, which is more generic in form. If the author could verify this, I'd appreciate it.

I'm using it for this to restore MySQL backups, where I don't know neither the headers or the structure of the CSV file ...

jamesbascle commented 6 years ago

That is accurate Thomas. You may also be able to use the reader and calling GetRecord using different types based on the number/contents of columns in the particular row using the CsvReader.Context.Record property.

On Sun, May 27, 2018 at 9:09 AM, Thomas Hansen notifications@github.com wrote:

For the record, if somebody sees this, I think you're supposed to use the CsvParser, which is more generic in form. If the author could verify this, I'd appreciate it.

I'm using it for this https://github.com/polterguy/phosphorusfive to restore MySQL backups, where I don't know neither the headers or the structure of the CSV file ...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JoshClose/CsvHelper/issues/1042#issuecomment-392316737, or mute the thread https://github.com/notifications/unsubscribe-auth/AD_ohSofHv9bn3Hh8--pw_w-XpBTw0J8ks5t2m1TgaJpZM4UORHQ .

polterguy commented 6 years ago

Thx James, I am currently using the following code, is this the best way to achieve what I want you think ...?

using (var stream = File.OpenText (rootFolder + infile)) {

    // Using the generic "CsvParser" since we've got no idea of the file's structure.
    using (var csv = new CsvParser (stream)) {

        // Reading the headers.
        var headers = csv.Read ();

        // Creating our insert SQL, which is reused, only with different parameters.
        string insertSql = CreateInsertSQL (table, types, headers);

        // Iterating through each record in file.
        var record = csv.Read ();
        long no = 0;
        while (record != null) {

            // Creating our SQL command.
            var cmd = new MySqlCommand (insertSql, connection);

            // Adding SQL parameters to our command.
            for (var idx = 0; idx < headers.Length; idx++) {
                if (types [headers [idx]] != MySqlDbType.Timestamp) {
                    var val = Convert (record [idx], types, headers [idx]);
                    cmd.Parameters.AddWithValue ("@val" + idx, val);
                }
            }

            // Executing our SQL command.
            cmd.ExecuteNonQuery ();

            // Reading our next record.
            record = csv.Read ();

            // Incrementing counter.
            no += 1;
        }
        e.Args.Value = no;
    }
}

There is some irrelevant code in my code example, which is related to my own domain problem and toolchain (Phosphorus Five and creating SQL insert statements) - However, the CsvHelper parts, does it look like the best way to achieve what I am trying to do ...?

I assume this would also parse the CSV file directly from the stream too right? As in, not load the entire file into memory or anything, right ...?

The reason why I'm asking, is that this is a database restore backup method, which potentially could see CSV files with hundreds of thousands of records, at which point obviously reading the entire file into memory would be a "no, no" ...

BTW, thx for an awesome tool James :)

JoshClose commented 6 years ago

That works, and yes that's streaming the data.

If you want to have CsvHelper do the type conversion for you, you can use CsvReader instead of CsvParser.

void Main()
{
    using (var stream = new MemoryStream())
    using (var writer = new StreamWriter(stream))
    using (var reader = new StreamReader(stream))
    using (var csv = new CsvReader(reader))
    {
        writer.WriteLine("Id,Name");
        writer.WriteLine("1,one");
        writer.WriteLine("2,two");
        writer.Flush();
        stream.Position = 0;

        csv.Read();
        csv.ReadHeader();
        while (csv.Read())
        {
            var id = csv.GetField<int>("Id");
            var name = csv.GetField<string>("Name");
        }
    }
}

If you want me to try and fit this into your example, let me know.

JoshClose commented 6 years ago

Here is an old DataTable example from the 2.0 docs. Shouldn't be many changes to work with the current version, and commands instead. This way will end up putting all the data into memory in the DataTable, so it's not ideal.

while( csv.Read() )
{
    var row = dt.NewRow();
    foreach( DataColumn column in dt.Columns )
    {
        row[column.ColumnName] = csv.GetField( column.DataType, column.ColumnName );
    }
    dt.Rows.Add( row );
}
polterguy commented 6 years ago

If you want me to try and fit this into your example, let me know.

Thank you Josh, if you think you can improve my usage of CsvHelper, I would deeply appreciate it :)

The problem I saw as I started using this, is that I don't know the types of each columns initially, which the types object actually encapsulates. You can't see that in the code actually, but it's doing a query of my MySqlDbType into the information schema database of MySQL. Here is the entire code. As you can see, the types variable is aDictionary<string, MySqlDbType> object, mapping from column name, to the correct database type. While the Convert method, converts from its string representation to the MySQL type necessary to correctly construct my MySqlParameter during insertion.

I must confess that I got a little bit confused, and the above code I am linking to, is not the code I am most proud of (blush). Partially, due to lack of understanding how CsvHelper works. If you think you can provide me with some simple changes or suggestions of how I can improve it, I would be highly grateful. Of particular interest would be if I could more easily apply the conversion, which I think is terrible today to be honest ...

The problem, is the following.

So there's simply no way the way I see it, that I can use some statically compiled C# code - Which is why I jumped to the CsvParser in the first place ...

JoshClose commented 6 years ago

That will definitely work. The 2 main benefits to CsvHelper is parsing the CSV file correctly with some level of fault tolerance, and converting the row into an object. If you're going directly into a database, you probably don't need to put it into an object at all. CsvHelper could do the type conversion for you, but you already have that part written based on the SQL types, so no big deal. I think what you have is fine. There any many ways you can do it.

polterguy commented 6 years ago

Thx (: