blaze / castra

Partitioned storage system based on blosc. **No longer actively maintained.**
BSD 3-Clause "New" or "Revised" License
153 stars 21 forks source link

BUG: Castra divisions/partitions #60

Closed thequackdaddy closed 6 years ago

thequackdaddy commented 8 years ago

Hello,

See this: https://github.com/dask/dask/issues/1057

Well this grew much more than I thought it was going to.

Castra appears to use the last index in a partition to population the divisions variable. I tried to change that, but then realized I basically could do it easier if I changed partitions.

So here it is...

This might not be the best way to get this to work, but it works.

Feel free to ignore.

thequackdaddy commented 8 years ago

@mrocklin Not trying to be pushy, but do you have thoughts on this? I've found castra to be super fast in my little application when I provide categoricals. However, the divisions are not consistent with your definition from blaze.

I guess a much simpler way of getting the same effect would be to have dask ignore the divisions that castra provides and generate its own divisions by opening every partition and getting value of the first index?

Thoughts?

mrocklin commented 8 years ago

cc @jcrist

mrocklin commented 8 years ago

@thequackdaddy you needn't worry about being pushy. I'm quite glad that you've identified this issue and gladder still that you're working to fix it. My apologies about being lax in review lately.

In principle the changes you propose seem to be correct. I'd like to take a bit longer to review them though. I'd also like to have @jcrist take a look as he has used this codebase more significantly than I have in recent history. He may be a bit occupied at the moment though.

@thequackdaddy what is your timeframe on this? Is it valuable to you to get this in and released in the short term?

thequackdaddy commented 8 years ago

I have no timeframe. Maybe a month or so? I just wasn't trying to take over your projects. Castra is very exciting to me.

With that said, I wasn't 100% sure if the auto responder was working. I'll let @jcrist do this on a schedule that keeps the tools high quality.

On Mar 22, 2016, at 4:18 PM, Matthew Rocklin notifications@github.com wrote:

@thequackdaddy you needn't worry about being pushy. I'm quite glad that you've identified this issue and gladder still that you're working to fix it. My apologies about being lax in review lately.

In principle the changes you propose seem to be correct. I'd like to take a bit longer to review them though. I'd also like to have @jcrist take a look as he has used this codebase more significantly than I have in recent history. He may be a bit occupied at the moment though.

@thequackdaddy what is your timeframe on this? Is it valuable to you to get this in and released in the short term?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

mrocklin commented 8 years ago

Oh, you are entirely welcome to take over any project you like! I'm well beyond saturated with work.