fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
929 stars 196 forks source link

Is there any progress on BigDeedle? #353

Closed JanWosnitza closed 8 years ago

JanWosnitza commented 8 years ago

I have to work with huge data (100GB to 2TB stored in HDF5-files). The data is consists of partitioned columns with a shared index. I havn't found any Deedle examples in that order of magnitude yet, besides the Deedle.BigDemo (didn't compile). Since our data does not fit into RAM my question is:

Is Deedle (already) able to handle big data?

Or more specific: Having a Frame containing the wohle data, is is possible to

The biggest overhead is reading the data from disk. So most preferably the data shouldn't be touched to often (e.g. filter values on the fly instead of doing it immediately).

Cheers Jan Wosnitza

JanWosnitza commented 8 years ago

Finally I read the error when compiling Deedle.BigDemo m(

Unexpected exception from provided type 'FSharp.Azure.StorageTypeProvider.AzureTypeProvider,accountName="<Insert your storage connection>"+Domain+Tables' member 'GetMethods': The type provider 'ProviderImplementation.AzureTypeProvider' reported an error: Unable to connect to the remote server [C:\Users\JanWosnitza\Desktop\Deedle.BigDemo-master\src\Deedle.BigSources\Deedle.BigSources.f

I knew I havn't entered the storageConnection string, but +insert random excuse+.

tpetricek commented 8 years ago

I did a talk about BigDeedle recently at NDC Oslo which has been recorded. The source code has some additional instructions on getting the BigDeedle demo to work.

That said, BigDeedle has been designed mainly for accessing time-series data, doing some interactive exploration and then fetching a subset of the data into memory. It does not currently implement external sorting - it can sort values by a key (e.g. time in case of time series) and perform various operations restricting the range of the series. So, I'm not sure if it directly fits your needs here - an alternative is to look at MBrace, which is a more general F# big data processing library.

JanWosnitza commented 8 years ago

Ok, so I will definitely have a look at MBrace, thanks. Btw. nice Talk, as always :+1:

jzabroski commented 5 years ago

@tpetricek Your link to MBrace was broken: www.mbrace.io Thanks for the link to your talk. Has it been transcribed anywhere to save me time? I'm a much faster note taker when I have a transcript than listening to a video.

tpetricek commented 5 years ago

@jzabroski Alas, I think MBrace is no longer active project. The internals of BigDeedle do not depend on that, but the demo I did in the talk does. I don't think there is a transcript, sadly.

In general, I'd be happy to help anyone interested in bringing BigDeedle back to life & use it, but I don't have much availability to work on that on my own right now.