JoshClose / CsvHelper

Library to help reading and writing CSV files
http://joshclose.github.io/CsvHelper/
Other
4.73k stars 1.06k forks source link

FastDynamicObject broke ExpandoObject workflow #2269

Open mrtristan opened 3 months ago

mrtristan commented 3 months ago

Was on version 31.0.2, bumped to 33.

was doing GetRecords<dynamic> and then .Cast<ExpandoObject>().ToArray() - this now yields Unable to cast object of type 'CsvHelper.FastDynamicObject' to type 'System.Dynamic.ExpandoObject'.

Is there any out of the box approach to reading results as expando?

very related: https://github.com/JoshClose/CsvHelper/issues/2251

JoshClose commented 3 months ago

ExpandoObject is extremely slow, which is why I'm now using a custom FastDynamicObject implementation instead. You can cast to IDictionary<string, object> and use that if you need to.

Generally, this is how it's used.

dynamic records = csv.GetRecords<dynamic>();
foreach (var record in records)
{
}

Why do you want the ExpandoObject? Maybe we can figure out an alternative.

mrtristan commented 3 months ago

@JoshClose thanks for the quick response

i built a data processing pipeline that executes arbitrary c# scripts against arbitrary data sets. there are a few steps where the data lands in csv files which need to be post-processed.

essentially, that expando was being passed to a compiled script using this lib and we have a bunch of scripts that "just use" the objects in a obj.Prop1 = obj.PropB fashion.

we could certainly move over to more of a dictionary accessor type pattern in the scripts but it'll involve a little bit of upfront pain.

i'm all for speed though so if the introduction of FastDynamicObject yields an overall faster result then maybe this isn't the end of the world.

JoshClose commented 3 months ago

Here is some code I just benchmarked.

int count = 1_000_000;

public void Expando()
{   
    for (var i = 0; i < count; i++)
    {
        dynamic o = new ExpandoObject();
        o.Id = i;
        o.Name = $"Name {i}";
        o.Guid = Guid.NewGuid();
        o.Date = new DateTimeOffset();
    }
}

public void FastDynamic()
{
    for (var i = 0; i < count; i++)
    {
        dynamic o = new FastDynamicObject();
        o.Id = i;
        o.Name = $"Name {i}";
        o.Guid = Guid.NewGuid();
        o.Date = new DateTimeOffset();
    }
}

image

It would probably be best to switch from ExpandoObject to IDictionary<string, object?>. I know other libraries like Dapper use a custom dynamic object like this too, and I believe they all implement IDictionary<string, object?> like ExpandoObject does.

mrtristan commented 3 months ago

interesting. didn't realize you still supported that dot-syntax property accessor pattern on that class. maybe i can still leverage it without explicitly trying to cast it to expando while we work on pivoting over to the dictionary usage.

andreaslennartz commented 3 months ago

Thanks for this discussion @mrtristan (and of course @JoshClose for this great library), I ran into the same issue when bumping from 31 to 33. For me, this was a breaking change which was really hard to spot this time.

I have an ETL library (ETLBox), which supports the ExpandoObject in lots of different places. The library works very well also for big amounts of data, e.g. loading millions of rows from a source like a csv file, transforming/cleansing/enriching it and the saving it in a in database. My experience so far is that the creation of objects is very rarely the bottleneck - most of the time it is the slow writing speed of the target, or time-consuming transformation in between. Just so that you know where I am coming from.

To be "compatible" with my other code, I decided to keep the ExpandoObject, and used this simple conversion function:

ExpandoObject bufferObject= ConvertToDynamic(CsvReader.GetRecord<dynamic>());

private ExpandoObject ConvertToDynamic(IDictionary<string, object> fastDynamicObject) {
    if (fastDynamicObject == null) return null;  
    IDictionary<string,object> result = new ExpandoObject();
    foreach (var item in fastDynamicObject) 
        result.Add(item.Key, item.Value);
    return (ExpandoObject)result;
}

I have two questions here:

JoshClose commented 3 months ago

Well, it's not a breaking change since GetRecords<dynamic> returns IEnumerable<dynamic> and not IEnumerable<ExpandoObject>. I tried to make it as ExpandoObject compatible as possible by implementing IDynamicMetaObjectProvider and IEnumerable<string, object?>.

I'll think about a way to do this. Maybe I can get an IDictionary<string, object?> from the ObjectResolver instead of newing up a FastDynamicObject. That would give you the ability to have it spit out an ExpandoObject instead.

andreaslennartz commented 3 months ago

Yes, you are right, the method signature hasn't changed, so it's not a breaking change. Sorry for this. Would be great if a way of obtaining the ExpandoObject without the additional conversion could find it's way into one of the next versions!

OneScripter commented 3 months ago

Broken for me as well. I need to convert each record into a Dictionary without knowing the key values at runtime.

JoshClose commented 3 months ago

@OneScripter Can you give an example of how you were using it before this change?

OneScripter commented 3 months ago

Here's the modified code actually, and it works now, but I feel like there's a better way.


List<Dictionary<string, object>> items = new List<Dictionary<string, object>>();

var records = csv.GetRecords<dynamic>();

foreach (var record in records)
{
    Dictionary<string, object> record = new Dictionary<string, object>();

    foreach (KeyValuePair<string,object> kvp in record)
    {
        if (!record.ContainsKey(kvp.Key))
        {
            record.Add(kvp.Key, kvp.Value);
        }
    }

    if (record.Count > 0)
    {
        items.Add(record);
    }
}
JoshClose commented 3 months ago

What were you doing before with ExpandoObject?

OneScripter commented 3 months ago

List<Dictionary<string, object>> items = new List<Dictionary<string, object>>();

var records = csv.GetRecords<dynamic>();

foreach (var r in records)
{
   var item = r as System.Dynamic.ExpandoObject;

   if (null != item)
   {
       var keys = item.Select(a => a.Key).ToList();
       var values = item.Select(a => a.Value).ToList();

       Dictionary<string, object> record = new Dictionary<string, object>();

       foreach (KeyValuePair<string, object> kvp in item)
       {
           record.Add(kvp.Key, kvp.Value);
       }

       items.Add(record);
   }
}
JoshClose commented 3 months ago

If you're putting it into a Dictionary, you can just do this instead.

var records = csv.GetRecords<dynamic>().OfType<IDictionary<string, object>>();

That will give you an IEnumerable<IDictionary<string, object>>.

eduarddejong commented 1 month ago

Edit: in my case this is about writing to a CSV file rather than reading, but it seems related.

After loosing quite a serious amount of hours trying to figure out what was wrong my code, I had to discover that it was the update of this NuGet package which broke everything. I have updated from 31 to 33.

In my project I am using my own DynamicObject implementation, which works just like ExpandoObject, but it can do things which ExpandoObject cannot.

It was actually very happy that this CsvHelper supported it directly out of the box, making things super simple and efficient.

Sadly DynamicObject support is clearly broken just like ExpandoObject support. I am getting a corrupted CSV output with incorrect columns.

My reason for DynamicObject is that I am using a mixture of real properties (which always need to be there and do not need to be dynamic) and dynamic properties (for variable data colums, implemented by a Dictionary, just like ExpandoObject does). This is making it more efficient then everything dynamic.

Such a mix of properties is impossible with ExpandoObject, as it cannot be overridden.

Initial reason I am using this DynamicObject though was actually that I am also binding it to a WPF DataGrid for presentation, which enables a way of fully variable dynamic data binding which otherwise appears to be impossible. So being able to directly export it to CSV was amazing.

The new FastDynamicObject might be technically more efficient, but I did not have any performance issue with DynamicObject. Actually, now having to implement another extra conversion layer between what I currently have and a new way of working with this library rather cost performance.

And it looks like the FastDynamicObject in this project cannot be directly either, since it is internal.

Please reconsider restoring old functionality if somehow possible. Thanks a lot!

Great project anyway.

JoshClose commented 1 month ago

Can you give an example? You should be able to write with a DynamicObject and anything that implements IDynamicMetaObjectProvider