fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
939 stars 197 forks source link

BigDeedle improvements #302

Closed tpetricek closed 9 years ago

tpetricek commented 9 years ago

This makes the addressing mechanism in BigDeedle a bit more flexible (as needed for internal BMC work). In particular:

NOTE: There is more cleanup to be done here, so no need to merge/review this yet.

tpetricek commented 9 years ago

I guess we should merge this into master at some point!

I have used this branch for the latest Deedle release (for both the use of BigDeedle at BMC and for public NuGet package) and my main reason for not merging this is that there are some places that I would like to clean up and that BigDeedle needs a public demo and documentation.

@adamklein @hmansell What do you think would be the best strategy? Should I write a summary of the changes I did that are affecting "small Deedle" so that you can look at those (and we merge it with BigDeedle work in progress)? Or do you think it is OK to wait a bit more and merge it once it is all nice? (I think I need ~2 days to just make everything a bit nicer).

hmansell commented 9 years ago

Probably making it nicer is better, IMO.

adamklein commented 9 years ago

Looked over the changes, whenever you want to merge this to master, no problems here.

tpetricek commented 9 years ago

OK, I think this is now ready to merge.

I did some more debugging while working on the public BigDeedle demo, which also revealed some interesting issues. The biggest one is that with BlockStore (and the public demo, which follows this pattern), the addresses we use are no longer compatible between multiple different representations of a vector (in-memory uses just index, but the address for BigDeedle means completely different thing).

This was correctly handled in most places, but there were still cases where wrong addressing scheme was used. This is now solved by marking vectors and indices with "addressing scheme" that's checked and enforced to match.

There is some comment about this in the design notes.

I also did a general cleanup of the BigDeedle code, so all the tricky things are now commented and it should actually be understandable & readable :)

tpetricek commented 9 years ago

I'm going to merge this - this just improves the existing PR that @adamklein reviewed and I also improved the documentation and cleaned up the code as I suggested and as @hmansell asked for.

The BigDeedle code is still quite complex, but I think merging it into master and sending other fixes and improvements as smaller PRs will be a better way to continue developing this.

(Also, I'll push a new release with the most recent improvements for the internal BMC needs.)