I run the models experimenting with two options: 1) using the distance from Google maps, 2) Using the geodesic distance from geopy library. Here are the best results I got:
Best model using Google maps distance: Random Forest with Accuracy 95%
Best model using geodesic distance: XGBoost with Accuracy 89%
Should I report the results from both methods and maybe explain for example: that using the geodesic distance, weakens the ability of the classifier but in a real huge project scenario with thousands of data it would would be impractical to manually calculate the optimal distances using google maps due to time constrains. Note that the library cannot use other distance metric apart from geodesic one.
I run the models experimenting with two options: 1) using the distance from Google maps, 2) Using the geodesic distance from geopy library. Here are the best results I got: Best model using Google maps distance: Random Forest with Accuracy 95%
Best model using geodesic distance: XGBoost with Accuracy 89%
Should I report the results from both methods and maybe explain for example: that using the geodesic distance, weakens the ability of the classifier but in a real huge project scenario with thousands of data it would would be impractical to manually calculate the optimal distances using google maps due to time constrains. Note that the library cannot use other distance metric apart from geodesic one.