Open eordentlich opened 8 months ago
Describe the bug Not sure it is the case for all examples, but for the mortgage ETL + XGBoost example there are some non-trivial discrepancies. Example: python script has udfs: https://github.com/NVIDIA/spark-rapids-examples/blob/main/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl.py#L22-L23 while the notebook(s) implement these using Spark SQL directly: https://github.com/NVIDIA/spark-rapids-examples/blob/main/examples/XGBoost-Examples/mortgage/notebooks/python/MortgageETL.ipynb?short_path=2af22cf#L454-L478 There are some other differences. Looks like the scripts may be lagging the notebooks.
Steps/Code to reproduce bug N/A
Expected behavior Notebooks and python script versions should ideally be aligned (or at least documented why they don't).
Environment details (please complete the following information) N/A
@nvliyuan Do you remember who wrote these examples? I can't recall the reason, but there should be.
Yes, the same example with different implementations should keep the same logic, will draft a pr to fix it.
Describe the bug Not sure it is the case for all examples, but for the mortgage ETL + XGBoost example there are some non-trivial discrepancies. Example: python script has udfs: https://github.com/NVIDIA/spark-rapids-examples/blob/main/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl.py#L22-L23 while the notebook(s) implement these using Spark SQL directly: https://github.com/NVIDIA/spark-rapids-examples/blob/main/examples/XGBoost-Examples/mortgage/notebooks/python/MortgageETL.ipynb?short_path=2af22cf#L454-L478 There are some other differences. Looks like the scripts may be lagging the notebooks.
Steps/Code to reproduce bug N/A
Expected behavior Notebooks and python script versions should ideally be aligned (or at least documented why they don't).
Environment details (please complete the following information) N/A