Open johnnyzhon opened 1 year ago
Thanks for @lijinf2 's information. Shall we compare the coefficients between gpu(spark_rapids_ml) model and cpu(pyspark) model? Is there a way to achieve that?
Yeah, there is a way. In fact, that is achieved in the test_compat, which compares the coefficients between GPU model and CPU model on VectorUDT input type. Currently, every algorithm has a similar "test_compat" test case that ensures the compatibility with CPU on the VectorUDT type. I may be wrong but it seems CPU algorithm accepts VectorUDT input type only.
test_toy_example uses array input type, because GPU does not restrict to VectorUDT type. Spark rapids ml supports three types, i.e. VectorUDT, array, multi-cols. Relevant information is in the test_classifier.
test_toy_example demonstrates a use case of using array input type.
Perhaps we should move test_compat up as the first test case in the file test_logistic_regression.py. Let me know if this looks better from a end user point of view.
https://github.com/NVIDIA/spark-rapids-ml/blob/branch-23.08/python/tests/test_logistic_regression.py#L306C9-L306C68 assert array_equal(coefficients, [-2.42377087, 2.42377087])
two concerns: