Add WordMatch + WordNoMatch descriptor

evidentlyai / evidently

Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

https://www.evidentlyai.com/evidently-oss

Apache License 2.0

5.44k stars 602 forks source link

Add WordMatch + WordNoMatch descriptor #1334

Closed jon-bown closed 1 month ago

jon-bown commented 1 month ago

Implement WordMatch, WordNoMatch, descriptor + feature and associated unit tests.

Resolves issue #1308 Resolves issue #1309

emeli-dral commented 1 month ago

Hi @jon-bown , This one looks good as well!

Could you please double check your src/evidently/features/_registry.py? It looks like paths to feature classes are invalid.

jon-bown commented 1 month ago

Hi @jon-bown , This one looks good as well!

Could you please double check your src/evidently/features/_registry.py? It looks like paths to feature classes are invalid.

Done! Also want to point out that the error you're getting with my ItemMatch PR would need to be resolved here also.

emeli-dral commented 1 month ago

Done! Also want to point out that the error you're getting with my ItemMatch PR would need to be resolved here also.

Thank you for the update, fair enough! I suggest you change a list to a tuple here as well, similar to what we’ve discussed in the related issue. This should help prevent the error and maintain consistency across both implementations.

Thanks for your attention to detail! 🙏

emeli-dral commented 1 month ago

Hi @jon-bown , It looks like there’s a small issue with the Python f-string here:

File "/home/runner/work/evidently/evidently/src/evidently/features/words_feature.py", line 198
    default_display_name=f"Text contains {self.mode.split("_")[1]} defined words"

I believe the string is being parsed incorrectly due to the nested double quotes inside the f-string. To resolve this, try changing the internal double quotes to single quotes ('_').

That should fix the issue!

jon-bown commented 1 month ago

Hi @emeli-dral I believe I've fixed all the issues with this PR. Can you let me know if anything else needs to change? Thanks!

emeli-dral commented 1 month ago

Hi @jon-bown, I had a chance to review your repo, and I really like how the descriptor is working—great job! However, I would like to request one more small change, apologies for the extra ask!

To keep things consistent and avoid potential confusion for users, I’d suggest making the WordMatch descriptor work directly with tuples, just like the ItemMatch descriptor. Having one descriptor use tuples and another use lists might be a bit unclear.

In the future, we plan to support lists as an input type in a more centralized way, and we’ll introduce list support across all relevant descriptors then.

Thanks again for your hard work!

jon-bown commented 1 month ago

Hi @emeli-dral, thanks for the update! As far as I can see, this implementation matches the ItemMatch/NoMatch implementation. Is there anything in particular you can see that needs to change? In the unit tests I convert the lists to tuples before they are fed to the feature, same as the other PR so it won't work with lists inside the data frame. I will also resolve the merge conflicts in the meantime.

emeli-dral commented 1 month ago

Hi @jon-bown, You’re absolutely right, that was my mistake — thank you for pointing that out! I've merged main into your branch, and I'll proceed with the merge as soon as the tests pass.

Thanks again for your patience and your thorough work on this!