Closed JanWosnitza closed 8 years ago
Finally I read the error when compiling Deedle.BigDemo m(
Unexpected exception from provided type 'FSharp.Azure.StorageTypeProvider.AzureTypeProvider,accountName="
<Insert your storage connection>
"+Domain+Tables' member 'GetMethods': The type provider 'ProviderImplementation.AzureTypeProvider' reported an error:Unable to connect to the remote server
[C:\Users\JanWosnitza\Desktop\Deedle.BigDemo-master\src\Deedle.BigSources\Deedle.BigSources.f
I knew I havn't entered the storageConnection string, but +insert random excuse+.
I did a talk about BigDeedle recently at NDC Oslo which has been recorded. The source code has some additional instructions on getting the BigDeedle demo to work.
That said, BigDeedle has been designed mainly for accessing time-series data, doing some interactive exploration and then fetching a subset of the data into memory. It does not currently implement external sorting - it can sort values by a key (e.g. time in case of time series) and perform various operations restricting the range of the series. So, I'm not sure if it directly fits your needs here - an alternative is to look at MBrace, which is a more general F# big data processing library.
Ok, so I will definitely have a look at MBrace, thanks. Btw. nice Talk, as always :+1:
@tpetricek Your link to MBrace was broken: www.mbrace.io Thanks for the link to your talk. Has it been transcribed anywhere to save me time? I'm a much faster note taker when I have a transcript than listening to a video.
@jzabroski Alas, I think MBrace is no longer active project. The internals of BigDeedle do not depend on that, but the demo I did in the talk does. I don't think there is a transcript, sadly.
In general, I'd be happy to help anyone interested in bringing BigDeedle back to life & use it, but I don't have much availability to work on that on my own right now.
I have to work with huge data (100GB to 2TB stored in HDF5-files). The data is consists of partitioned columns with a shared index. I havn't found any Deedle examples in that order of magnitude yet, besides the Deedle.BigDemo (didn't compile). Since our data does not fit into RAM my question is:
Is Deedle (already) able to handle big data?
Or more specific: Having a Frame containing the wohle data, is is possible to
The biggest overhead is reading the data from disk. So most preferably the data shouldn't be touched to often (e.g. filter values on the fly instead of doing it immediately).
Cheers Jan Wosnitza