fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
938 stars 196 forks source link

.NET 4.0 support #28

Closed buybackoff closed 10 years ago

buybackoff commented 11 years ago

This is a great and long awaited library, but could it target .NET 4.0?

It is quite trivial to replace IReadOnlyList by ReadOnlyCollection (done here https://github.com/buybackoff/FSharp.DataFrame/commit/7e0b84c096ab3a27a55fc0658c832555cd65f269, all tests pass).

However there are modules FrameUtils and FrameExtentions that are tightly coupled with FSharp.Data.DesignTime for type inference from TextReader. Then the method ReadCSV is used from tests, but the data supplied is a .csv file. As I understand, runtime FSharp.Data could infer types from sample files, but in FrameUtils the data is supplied as TextReader.

This SO question says one doesn't nees DesignTime reference and could delete it, but not in this case. http://stackoverflow.com/questions/19214044/is-fsharp-data-designtime-net-4-5-only

Probably .CSV parsing utility and extensions should not be a part of the DataFrame itself, but reside in tests or samples? I am quite happy with Frame constructor only and could easily construct columns myself and use the constructor like on the last line in FrameUtils: Frame(rowIndex, columnIndex, Vector.ofValues columns).

tpetricek commented 11 years ago

I think supporting .NET 4.0 is a fairly reasonable request. Can you submit your change to use ReadOnlyCollection as a pull request (without the DotSetting.user file)?

The library currently depends on the CSV parser (that's okay) and type inference (which is only available in FSharp.Data.DesignTime.dll, because CSV type provider only needs it at design time).

This is a bit silly dependency - I think we need to have an easy way to read CSV files directly in the data frame library (people use this all the time), but we do not really need F# Data, just the CSV parser & inference..

I'm not sure what the best option is - I suppose these might be included in some common shared base data science library (at some point) but for now, we can probably just copy & paste the relevant files (perhaps after some refactoring to make this just a file or two).

Do you need this urgently for some project (or do we have some time to refactor & plan how to best do this)?

ovatsus commented 11 years ago

I think we could move the type inference to the runtime component of FSharp.Data. It is being used at runtime here, and nothing there depends on 4.5

tpetricek commented 11 years ago

I was also wondering if it would make sense to separate the CSV type inference from the structural type inference (to some extent). They could still share some basic functionality, but the two have quite different nature so perhaps that would make the code more easier to follow/extend/improve.

buybackoff commented 11 years ago

Submitted a pull request.

Do you need this urgently for some project (or do we have some time to refactor & plan how to best do this)? This is not urgent and I could strip away all .csv stuff and 4.5 references myself in the meantime. Going forward I hope to completely migrate from my current lame usage of SortedList<K1, SortedList<K2, V>> to this library.

tpetricek commented 10 years ago

The latest release is 4.0!