JuliaStats / GLM.jl

Generalized linear models in Julia
Other
584 stars 114 forks source link

Implementation of `dropcollinear` feature in `GeneralizedLinearModel` #488

Closed mousum-github closed 1 year ago

mousum-github commented 2 years ago

This pull request supersedes #340

We are looking for the dropcollinear feature in GLM and found #340. The PR, which opened in Oct'2019, looks very close to complete.

Apart from the changes covered in PR 340, we have added

We have performed logistic regression with 10 independent variables, 10,000 rows and dropcollinear feature true and false; the dataset does not have a multicollinearity issue. The machine configuration is as follows: - OS – Linux, CPU - AMD Ryzen 5 3600 6-Core Processor RAM – 64 GB

With dropcollinear = true: average time 4.381 ms With dropcollinear = false: average time 4.396 ms

codecov-commenter commented 2 years ago

Codecov Report

Base: 87.08% // Head: 87.32% // Increases project coverage by +0.24% :tada:

Coverage data is based on head (07e44de) compared to base (e2e9c85). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #488 +/- ## ========================================== + Coverage 87.08% 87.32% +0.24% ========================================== Files 7 7 Lines 929 947 +18 ========================================== + Hits 809 827 +18 Misses 120 120 ``` | [Impacted Files](https://codecov.io/gh/JuliaStats/GLM.jl/pull/488?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats) | Coverage Δ | | |---|---|---| | [src/glmfit.jl](https://codecov.io/gh/JuliaStats/GLM.jl/pull/488/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats#diff-c3JjL2dsbWZpdC5qbA==) | `81.01% <100.00%> (+0.10%)` | :arrow_up: | | [src/linpred.jl](https://codecov.io/gh/JuliaStats/GLM.jl/pull/488/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats#diff-c3JjL2xpbnByZWQuamw=) | `85.61% <100.00%> (+2.00%)` | :arrow_up: | | [src/lm.jl](https://codecov.io/gh/JuliaStats/GLM.jl/pull/488/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats#diff-c3JjL2xtLmps) | `93.33% <100.00%> (-0.05%)` | :arrow_down: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

ViralBShah commented 1 year ago

@mousum-github Can you fix the failing doctests?

mousum-github commented 1 year ago

Hi @nalimilan, Covered most of your suggestions. But there is a conflict in glmfit.jl file and unable check whether all checks are passed or not. Sorry for the delay in response. cc: @ViralBShah

nalimilan commented 1 year ago

Thanks. I can have a look at the conflict once the PR is ready if you can't fix it. Just ensure tests pass on your side for now.

nalimilan commented 1 year ago

I've a branch which fixes the conflict with master but I can't push the changes to your branch. Can you give me the necessary permissions? See https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork. Otherwise you can do git fetch && git cherry-pick 998393f9d31b29a3bde77373c2fbbfd8844353cb to apply my merge commit to your branch.

ayushpatnaikgit commented 1 year ago

I've a branch which fixes the conflict with master but I can't push the changes to your branch. Can you give me the necessary permissions? See https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork. Otherwise you can do git fetch && git cherry-pick 998393f9d31b29a3bde77373c2fbbfd8844353cb to apply my merge commit to your branch.

Hi Milan, Since we have done the PR from an organisation account, GitHub doesn't permit us to allow edits from maintainers. Can we please add you as an outside collaborator to xKDR?

ViralBShah commented 1 year ago

May I suggest we give @mousum-github commit access here, with the usual rules of working through PRs like we have everyone else?

bkamins commented 1 year ago

May I suggest we give @mousum-github commit access here

This will not resolve the issue. It is the reverse that is needed (i.e. @nalimilan needs to have access to PR source repository).


Regarding commit access (as it is an independent issue).

If someone from xKDR team (or the whole team) were willing to work on improving GLM.jl in long term it would be really great. There are many open issues in GLM.jl currently + there are loads of things that could be added.

However, is that it is not the person who makes a PR, but a person who reviews it that needs commit access. Currently mostly @nalimilan serves as a reviewer on this repository (and @nalimilan - thank you for that). Of course we could change that in the future (there are many other packages that need attention by @nalimilan so having someone else oversee GLM.jl after some transition period, during which the person would learn JuliaStats standards, would be great).

Given these considerations @mousum-github - would you find having commit access to GLM.jl useful?

ViralBShah commented 1 year ago

May I suggest we give @mousum-github commit access here

This will not resolve the issue. It is the reverse that is needed (i.e. @nalimilan needs to have access to PR source repository).

Regarding commit access (as it is an independent issue).

If someone from xKDR team (or the whole team) were willing to work on improving GLM.jl in long term it would be really great. There are many open issues in GLM.jl currently + there are loads of things that could be added.

However, is that it is not the person who makes a PR, but a person who reviews it that needs commit access. Currently mostly @nalimilan serves as a reviewer on this repository (and @nalimilan - thank you for that). Of course we could change that in the future (there are many other packages that need attention by @nalimilan so having someone else oversee GLM.jl after some transition period, during which the person would learn JuliaStats standards, would be great).

Given these considerations @mousum-github - would you find having commit access to GLM.jl useful?

Right - my suggestion was indirect - so that @mousum-github can then make future PRs using branches on this repo, and perhaps even move this repo from a fork to a branch. That would make it easier for @nalimilan and everyone to edit directly and what not. And this would also serve as a first step towards having more contributors such as @mousum-github overseeing this repo.

mousum-github commented 1 year ago

I am sure that the commit access will be very useful, and I am happy to help maintain the project going forward.

nalimilan commented 1 year ago

Thanks @ayushpatnaikgit. Not sure why GitHub doesn't support that, but thanks to the access to the xKDR repo I could push my merge commit. Tests still appear to fail though.

Usually we only give commit access to people who have made several PRs over a relatively long period, but given that xKDR is invested in improving the stats ecosystem I guess we can accelerate that process. I've just sent an invitation to @mousum-github. Help reviewing other PRs would of course be welcome!

mousum-github commented 1 year ago

Thanks @ayushpatnaikgit. Not sure why GitHub doesn't support that, but thanks to the access to the xKDR repo I could push my merge commit. Tests still appear to fail though.

Usually we only give commit access to people who have made several PRs over a relatively long period, but given that xKDR is invested in improving the stats ecosystem I guess we can accelerate that process. I've just sent an invitation to @mousum-github. Help reviewing other PRs would of course be welcome!

Thanks @nalimilan

mousum-github commented 1 year ago

Hi @nalimilan, We have updated the code and documentation.

We have updated the newly added nulldeviance and nullloglikelihood functions in glm. Passed the dropcollinear argument value to fit function for null models.

We also added a test case to check the PosDefException exception when dropcollinear is false for a rank-deficient model. Unfortunately, the test case was not passed in a few systems. So, we have commented the test case for timing.

nalimilan commented 1 year ago

We also added a test case to check the PosDefException exception when dropcollinear is false for a rank-deficient model. Unfortunately, the test case was not passed in a few systems. So, we have commented the test case for timing.

Maybe another example would pass tests more reliably? We really need to check that path. How about trying with categorical variables, like in the "rankdeficient" testset for linear models?

mousum-github commented 1 year ago

Okay. Let me check.

mousum-github commented 1 year ago

Okay. Let me check.

Added two test cases similar to rankdeficient.