Open echarso opened 2 years ago
Hi @echarso, Thank you for this question.
Currently, the tool works only with pandas DataFrames or CSV files (CLI version). This means that you can either transform Spark DataFrame to Pandas DataFrame and then run evidently. In this scenario, having a smaller data sample will make sense.
We are adapting tool to larger amounts of data; this will be addressed in the next releases.
Aloha @emeli-dral ,
I was wondering if there has been any updates regarding any plans or further discussion regarding integration with Spark DataFrames? Extremely excited to see the library continue to grow :D
Cheers
Hi @echarso, @lowballedintern, we are now starting to work on the beta for Spark integration. I was wondering if any of you'd be open to chatting about how you want to see that implemented?
If yes, feel free to stop by Discord https://discord.com/invite/xZjKRaNp8b, drop a line to hello@evidentlyai.com, or maybe describe here how you'd see the ideal solution?
Hi, may I know if is there any update on the spark integration? is there any timeline for this? Thank you!
Is it possible to use Evidently with Spark Dataframe now? I have huge amount of data in spark dataframe and converting it in pandas dataframe would be time taking. What are other ways to integrate it? Let me know if I can utilize beta version of the feature.
Hi @prity-k,
Spark support is currently in development. If you want to test it pre-release, here are the instructions (currently works from several data drift metrics): https://github.com/evidentlyai/evidently/pull/806
Thank you for this nice project. I was wondering if there is going to be any integration with spark data frames or big data with your work. Really sorry if that integration exists and I couldn't find it.