fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
937 stars 195 forks source link

Reading CSV (#332) and DropSparseRows (#333) #334

Closed tpetricek closed 8 years ago

tpetricek commented 8 years ago

I changed how reading CSVs is done so that it does not allocate too much necessary memory. It is a bit faster and a bit less wasteful, but it is not as significant improvement as I'd be hoping for. More work would probably involve tweaking the CSV parser from F# Data.

I added test for DropSparseRows from @adamklein in #333. This seems to be working as expected. However, there was a recent(-ish) change in this bit (https://github.com/BlueMountainCapital/Deedle/commit/2a0fc636fb014432b56243870c5433a5fe59b50b), so perhaps you're not running the latest version?

adamklein commented 8 years ago

Is the DropSparseRows behavior working in 1.2.4? Seems not to work... but I thought the fix went in before 1.2.4 went out?

adamklein commented 8 years ago

NM, I see things work in 1.2.4. My projects are somehow still loading 1.2.3, configuration issue.

tpetricek commented 8 years ago

@adamklein Sorry for the mess here, I had this forgotten PR here. I suppose this is good to merge?

(This also now fixes the issue with CI builds.)

tpetricek commented 8 years ago

@adamklein Sorry, I made a bit of a mess here! This unmerged PR had an optimization that changed the code for reading CSV (to make it faster) - I adapted your changes to add Guid and DateTime support (but did it wrong the first time). Now it should all work again... we'll see if the CI finishes!

tpetricek commented 8 years ago

Okay, now we've got one genuine test failure in Can save MSFT data as CSV file and read it afterwards (with custom format) on CI - this might be related to the DateTime support. I'll see what's going on.

tpetricek commented 8 years ago

The test failure was indeed caused by the DateTime parsing.

Previously, we did not parse dates and so the test had an explicit call to DateTime.Parse. Now, the date gets recognized & parsed automatically, so we do not need that. This commit changes the test to make it work.

@adamklein All seems to be green now, so let me know if you're OK with merging this!

adamklein commented 8 years ago

Yes, let's merge! It's be great to release a minor version after these little fixes. Do you have any thoughts on the parenthesis issue in column names?