dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.15k stars 8.71k forks source link

Better support for mypy. #6496

Open trivialfis opened 3 years ago

trivialfis commented 3 years ago

Currently many more Python projects like dask and optuna are using Python type hints. With the Python package of xgboost gaining more and more features, we should also adopt mypy as a safe guard against some type errors and for better code documentation.

trivialfis commented 3 years ago

I marked this as good first issue. Adding type checks for Python is an incremental process, parameter by parameter and function by function. It would be a good chance to review existing code base and make improvement during the process. I have made some progress on https://github.com/dmlc/xgboost/pull/6519 with the dask module, which is by far the most dynamic module in xgboost. Other modules can use it as a base line.

Contributions are welcomed!

jjwang01 commented 2 years ago

Hi, I was wondering if I could work on this as a first issue. Thanks!

trivialfis commented 2 years ago

@jjwang01 Please do. ;-) ping me if you need any help

jjwang01 commented 2 years ago

Is it alright if I split this into multiple PRs?

trivialfis commented 2 years ago

@jjwang01 As many as you need. In fact, it's a good idea to have multiple PRs so we can communicate better around details.

jjwang01 commented 2 years ago

@trivialfis Is there some sort of local build script that I can use to test locally rather than after submitting a commit to PR?

jjwang01 commented 2 years ago

Hi, just wanted to follow up on this comment.

trivialfis commented 2 years ago

Hi, sorry for missing the message.

  1. Install mypy in your python envorinment.
  2. cd python-package in xgboost source directory.
  3. Run mypy .
trivialfis commented 2 years ago

The only module that's lacking the support for mypy is the pyspark interface now.

trivialfis commented 2 years ago

Notes:

trivialfis commented 1 year ago

Will not cover pyspark at 2.0. mypy is not ready for pyspark's deep inheritance style.

michael-gendy-mention-me commented 1 year ago

Is there anything else to do here for now? Referring to the comments 'The only module that's lacking the support for mypy is the pyspark interface now' followed by 'Will not cover pyspark at 2.0'. If there's anymore locations where this is needed I'm happy to pick it up

trivialfis commented 1 year ago

I think we will keep it opened until xgboost is fully covered, including the pyspark module. You are more than welcome to work on it. I think there's still some parts of the pyspark interface can be typed.

SANTHOSH-MAMIDISETTI commented 1 year ago

I am new to open source and am willing to work over it , provided someone helps me in knowing what has to be done , @trivialfis is this still going on ? or are there any other issues that I might be able to work on ,

trivialfis commented 1 year ago

Hi, thank you for volunteering @SANTHOSH-MAMIDISETTI ! Currently, XGBoost has type hint for the core Python library, but not yet for most of the demos and tests. In addition, the PySpark module is not yet typed. Feel free to pick an untyped file and start adding type hints.

SANTHOSH-MAMIDISETTI commented 1 year ago

Hi, thank you for volunteering @SANTHOSH-MAMIDISETTI ! Currently, XGBoost has type hint for the core Python library, but not yet for most of the demos and tests. In addition, the PySpark module is not yet typed. Feel free to pick an untyped file and start adding type hints.

Dear @trivialfis ,

Thank you for your prompt response. I am excited to contribute to the XGBoost project and would like to request assignment for the issue regarding adding type hints to the PySpark module, as mentioned earlier.

As a beginner in open source, I anticipate that I may encounter challenges along the way. Therefore, I would appreciate some guidance on whom I can reach out to for assistance when faced with difficulties. It would be reassuring to know that there is a support system in place to help newcomers like myself navigate any obstacles that may arise during the contribution process.

Thank you once again for the opportunity to contribute. I am eager to make a meaningful impact on the project.

Sincerely, @SANTHOSH-MAMIDISETTI

trivialfis commented 1 year ago

Hi @SANTHOSH-MAMIDISETTI , there's an on-going effort for the pyspark module here: https://github.com/dmlc/xgboost/pull/9156

I would appreciate some guidance on whom I can reach out to for assistance when faced with difficulties.

Feel free to ping me or @hcho3 for assistance when needed. I should be able to reply unless away from my working devices. .

As a beginner in open source,

Hope that you enjoy the journey!

michael-gendy-mention-me commented 1 year ago

Looks like I was too slow off the mark 😄 - is there anything left to type for the pyspark module? Or elsewhere for that matter? As far as I can tell it's all done

trivialfis commented 1 year ago

Tons of! Examples under the demo directory, tests under the tests directory. Also, the core package contains many use of type:ignore and the Any, removing them can help make the code more rigorous.

michael-gendy-mention-me commented 1 year ago

Thanks a lot - will contribute on this PR to try and remove the type: ignore hint. This will be my first attempt at an open source contribution so would value some guidance on if I'm on the right track