Closed sebhofer closed 5 years ago
@sebhofer I checked Pandas' behavior on similar case. It basically drops rows of the group column that has missing value first and then group by it. I would prefer that instead of throwing out an error message. I'll dig a bit to see how to accomplish this efficiently.
I'm not sure I agree. If you just drop some rows and move on quietly, the user might never notice that something is off. In an interactive environment, which is, I think, where Deedle is mostly used, I rather I get an error message, and I can fix the problem myself.
I think both throwing values away and throwing an exception are sensible design choices - I would have slight preference for exception, because I feel the F# style is to be a bit more explicit than in the Python style (it just means people have to do some extra work to drop before grouping, but that seems OK).
We might make that easier if we added Frame.dropSparseRowsBy "Column"
which would drop rows where the value in Column
is missing.
@tpetricek I'm ok with throwing error if users got the message that some data are missing so that they can check data and clean it up.
@sebhofer Could you submit another commit to throw a little bit more explicit message to suggest user to check missing data in grouping columns?
@zyzhu Sure, will do. I can also try to come up with a PR Frame.dropSparseRowsBy "Column"
if that's desired. I really like the idea @tpetricek
I had a quick look at the grouping function. As frame.GroupByLabels
is used by different grouping functions (I guess the relevant ones are GroupRowsBy
and GroupRowsUsing
), I think it would be most clear and most helpful to the user if there was a different error for each function. Additionally, I will still keep the check in GroupByLabels
to be safe. For each case, I will try to find a simple way to correct the problem and give a hint to the user in the error message.
@zyzhu please let me know if you don't like this approach.
@sebhofer That makes a lot of sense to me. Thanks!
Hi,
i noticed that you already made a PR for Frame.dropSparseRowsBy
, but you also add the function in this PR. Was this intended? There are also quite a lot of merge-commits. Would you mind rebasing/squashing this?
@kblohm No, that was not intended! I am new to forking projects and creating PRs so I really messed this up yesterday. Also, I didn't realize that everything I push to the master branch gets added to the PR. I'm not quite sure how to proceed with this. I'm guessing it's best to close this and open a new PR on a dedicated branch. Any input is welcome!
You could clean this one up, but as far as i know you can not change the source branch. I would just close this and create a branch for your changes and then open a new PR. If you need any help with this, let me know. :)
@sebhofer You could sync your fork and then create another branch. Then commit just the change of throwing exception https://github.com/fslaborg/Deedle/pull/380/commits/e85244dae559fbd5ed98bea623aecc4ba8d3a930 and submit the pull.
I'll merge it after that. BTW, @kblohm just changed the build using FAKE 5. After syncing with upstream, you need to do the following to get started.
You do not need to install fake globally, you can just run fake.cmd
/ fake.sh
and that will install it just for this project. I am also using the global version, but it is not a requirement.
@kblohm Yeah, that seems like the cleanest solution. I think I'll be able to handle it... but you never know. I'll let you know otherwise. Thanks for your help!
@zyzhu I'll try to do this tonight. On another note, did you see the issue I opened about working with the Deedle src? I'm struggling a bit with this and could use some advice. Or is this a trivial problem and I'm just missing something really obvious?
@sebhofer I did see that. I am traveling for work this week. Will write up some drafts next week when I get back to office. But I will need some help from @tpetricek as he knows the original work more inside out than me as new comer too.
Thanks to @kblohm, we are on FAKE 5 already. Just follow what he wrote above and you shall be able to build the source.
As for how to work with fsx and source code. I will try to write up some sample cases next week.
@zyzhu Perfect, thanks. I just wasn't sure if it was just me being stupid!
@kblohm I might take you up on offer afterall :) How do I best clean this up? As I now have all these merge-commits in both master
and origin/master
I'm worried that by rebasing I'll make things worse... or is that not a problem?
Did you mange to fix this? If you still need to cleanup your master branch, you can
git fetch upstream
git checkout master
git reset upstream/master --hard
This will reset all your changes, but it looks like you already have a new branch with the stuff you wanted to keep. To get this up to your origin, you will have to force push.
I've just merged the new pull.
@kblohm I did manage to create a new branch for PR, but I still didn’t know how to fix my master. So this is exactly what I needed, thx! Will try tomorrow.
@zyzhu Great, thank you!
Throw error when trying to group by column with missing values. This leads to unexpected results (see, e.g., this issue). This is not really a fix, but grouping on missing values seemsill-defined anyway.