evidentlyai / evidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
https://www.evidentlyai.com/evidently-oss
Apache License 2.0
5.46k stars 603 forks source link

Spark dataframes integration? #92

Open echarso opened 2 years ago

echarso commented 2 years ago

Thank you for this nice project. I was wondering if there is going to be any integration with spark data frames or big data with your work. Really sorry if that integration exists and I couldn't find it.

emeli-dral commented 2 years ago

Hi @echarso, Thank you for this question.

Currently, the tool works only with pandas DataFrames or CSV files (CLI version). This means that you can either transform Spark DataFrame to Pandas DataFrame and then run evidently. In this scenario, having a smaller data sample will make sense.

We are adapting tool to larger amounts of data; this will be addressed in the next releases.

lowballedintern commented 2 years ago

Aloha @emeli-dral ,

I was wondering if there has been any updates regarding any plans or further discussion regarding integration with Spark DataFrames? Extremely excited to see the library continue to grow :D

Cheers

elenasamuylova commented 2 years ago

Hi @echarso, @lowballedintern, we are now starting to work on the beta for Spark integration. I was wondering if any of you'd be open to chatting about how you want to see that implemented?

If yes, feel free to stop by Discord https://discord.com/invite/xZjKRaNp8b, drop a line to hello@evidentlyai.com, or maybe describe here how you'd see the ideal solution?

luckyfgong commented 1 year ago

Hi, may I know if is there any update on the spark integration? is there any timeline for this? Thank you!

prity-k commented 1 year ago

Is it possible to use Evidently with Spark Dataframe now? I have huge amount of data in spark dataframe and converting it in pandas dataframe would be time taking. What are other ways to integrate it? Let me know if I can utilize beta version of the feature.

elenasamuylova commented 1 year ago

Hi @prity-k,

Spark support is currently in development. If you want to test it pre-release, here are the instructions (currently works from several data drift metrics): https://github.com/evidentlyai/evidently/pull/806