Closed thequackdaddy closed 6 years ago
@mrocklin Not trying to be pushy, but do you have thoughts on this? I've found castra to be super fast in my little application when I provide categoricals. However, the divisions are not consistent with your definition from blaze.
I guess a much simpler way of getting the same effect would be to have dask ignore the divisions that castra provides and generate its own divisions by opening every partition and getting value of the first index?
Thoughts?
cc @jcrist
@thequackdaddy you needn't worry about being pushy. I'm quite glad that you've identified this issue and gladder still that you're working to fix it. My apologies about being lax in review lately.
In principle the changes you propose seem to be correct. I'd like to take a bit longer to review them though. I'd also like to have @jcrist take a look as he has used this codebase more significantly than I have in recent history. He may be a bit occupied at the moment though.
@thequackdaddy what is your timeframe on this? Is it valuable to you to get this in and released in the short term?
I have no timeframe. Maybe a month or so? I just wasn't trying to take over your projects. Castra is very exciting to me.
With that said, I wasn't 100% sure if the auto responder was working. I'll let @jcrist do this on a schedule that keeps the tools high quality.
On Mar 22, 2016, at 4:18 PM, Matthew Rocklin notifications@github.com wrote:
@thequackdaddy you needn't worry about being pushy. I'm quite glad that you've identified this issue and gladder still that you're working to fix it. My apologies about being lax in review lately.
In principle the changes you propose seem to be correct. I'd like to take a bit longer to review them though. I'd also like to have @jcrist take a look as he has used this codebase more significantly than I have in recent history. He may be a bit occupied at the moment though.
@thequackdaddy what is your timeframe on this? Is it valuable to you to get this in and released in the short term?
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub
Oh, you are entirely welcome to take over any project you like! I'm well beyond saturated with work.
Hello,
See this: https://github.com/dask/dask/issues/1057
Well this grew much more than I thought it was going to.
Castra appears to use the last index in a partition to population the divisions variable. I tried to change that, but then realized I basically could do it easier if I changed partitions.
So here it is...
This might not be the best way to get this to work, but it works.
Feel free to ignore.