DSML-Research-Group / public-projects

Github Repository
GNU General Public License v3.0
1 stars 0 forks source link

[PROJECT IDEA]Add Double Lasso to Python Ecosystem #4

Open jaredDlewis opened 1 year ago

jaredDlewis commented 1 year ago

Main Project Goal

Add Double Lasso functionality for feature selection to the Python ecosystem. This could be done by creating a standalone library, but it would make a much bigger impact on the Python community if incorporated into the open source statsmodels library.

Why This Project is Interesting

Using double lasso for feature selection is a statistically sound way to produce a parsimonious linear regression model for statistical inference, but it is not currently implemented in any easy-to-find publicly available python libraries. It is currently available for Stata users, but it would be a great addition to the Python ecosystem to open-source the functionality for statisticians and researchers who don't have a Stata license. There are also some interesting technical challenges (listed below) that would allow the developer of this functionality to dive into the front lines of statistical methods.

Brief Description of Work Involved

Add a tool to the Python ecosystem that implements Double Lasso similar to Stata's current implementation. You should also look at the options provided in Stata's Lasso implementation and compare it to statsmodels's options to see what kind of functionality you can add. Some notable challenges:

First Steps

JonathanBechtel commented 1 year ago

@jaredDlewis Thanks. Question: is the Group Lasso segment in addition to Double LASSO or a part of it? I ask since there's already a Group LASSO implementation in python: https://group-lasso.readthedocs.io/en/latest/