fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 197 forks source link

Proposal: ReadCsv schema provider as lambda function #454

Open pkese opened 5 years ago

pkese commented 5 years ago

The combination of inferTypes and schema is rather unflexible.

In my case (I'm trying to process lots of slightly different .csv files) I'd much prefer if I was able to provide types based on actual column titles that the reader found in the .csv file.

I'd propose a typeResolver parameter to ReadCsv that would accept a lambda function that resolves column titles to target types:

    Frame.ReadCsv("sample.csv", typeResolver=(fun title ->
        match title with
        | t when t.contains "Name" -> Some "string"
        | t when t.endsWith "Id" -> Some "guid"
        | t when t.isCapitalized -> Some "float?"
        | _ -> None))

My aim at this point is to:

I'm willing to look into the issue and try to provide an implementation as well.

tpetricek commented 5 years ago

This would not work for CSV provider, because CSV provider needs to take a literal as a parameter (i.e. just a string) but it sounds pretty sensible for CSV parser in Deedle.

I'd be happy with this but perhaps also: