MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
725 stars 68 forks source link

deps: Move matplotlib dependency to extras #68

Open philippefutureboy opened 9 months ago

philippefutureboy commented 9 months ago

Hello @MaartenGr!

A two high severity security alerts in pillow 9.50.0, a dependency of matplotlib has been brought to our attention by dependabot:

Pillow versions before v10.0.1 bundled libwebp binaries in wheels that are vulnerable to https://github.com/advisories/GHSA-hhrh-69hc-fgg7 (previously https://github.com/advisories/GHSA-j7hp-h8jx-5ppr). Pillow v10.0.1 upgrades the bundled libwebp binary to v1.3.2.

Heap buffer overflow in libwebp allow a remote attacker to perform an out of bounds memory write via a crafted HTML page.

Investigating further, we realized that we do not use pillow; nor do we use matplotlib. We found our only dependency relying on matplotlib was polyfuzz, and we do not use the functionality provided by this dependency.

Would you be willing to make matplotlib an optional dependency? It seems to be only required by the visualize_precision_recall function in https://github.com/MaartenGr/PolyFuzz/blob/e7540030d6dddc64bdb94c474ed6360dd7a5cdf7/polyfuzz/metrics.py#L56 .

I don't know what your end user usage of this function is like, but on our end we do not use it (we primarly use polyfuzz to catch duplicate strings in user-managed datasets), and as such having matplotlib and its entire dependency tree to manage in our already large dependency array is something we'd rather not have to do 😅

So what do you say? :)

Thanks a lot! (And thanks for this fantastic package ;))

Cheers! Philippe

MaartenGr commented 9 months ago

Thank you for the kind words!

I understand that since you are not using that dependency and the security alerts for pillow, removing it would make sense for your use case. However, if I were to remove matplotlib and put it as an optional dependency, that might break existing pipelines that are using the function and matplotlib in production, which is definitely something I want to prevent.

Instead, an option might be to only import the necessary matplotlib functions when they are used within PolyFuzz. It is not exactly the nicest way of coding but it might prevent errors if you install polyfuzz with --no-deps.

philippefutureboy commented 8 months ago

That's an interesting alternative. I'm not sure if that would translate properly to poetry since I don't think it supports per-package "no-deps" option. You can go that route if you want, but I doubt it will solve the issue on my end.

My suggestion is to leave as is and wait for when you are ready for a major (breaking) version to introduce this as an breaking change. What do you think?

MaartenGr commented 8 months ago

Ah, that makes things indeed difficult. Just to be sure, I believe this relates to that which seems to have solved that. If I am not mistaken then it might be possible to install it without dependencies in poetry. If so, then I would definitely prefer not to make any breaking changes.

philippefutureboy commented 8 months ago

The linked issue indeed relate to this. I see how one could do poetry install followed by pip install --no-dep <package==...>, but in this case it doesn't help with dependabot, since for dependabot to pick up on security vulnerability the dependency (polyfuzz in this case) has to be present in the lockfile for dependabot to analyze.

I understand that you do not want to introduce breaking changes, and I respect your choice. My suggestion is that if you do introduce a breaking change from another feature (or for example if you drop support to an older version of Python 3), that could be a fortuitous moment to integrate the migration of matplotlib as an optional dependency.

As far as I am concerned, for the time being we'll just stick with the status quo and deal with the vulnerability alerts that may come from polyfuzz's dependencies. Your willingness to have had this discussion is appreciated :)

I won't close in case you want to keep open for introduction alongside other breaking changes when the time comes; feel free to close!

MaartenGr commented 8 months ago

Ah right, that makes sense! I hoped that it would solve the issues but unfortunately, it does not.

I understand that you do not want to introduce breaking changes, and I respect your choice. My suggestion is that if you do introduce a breaking change from another feature (or for example if you drop support to an older version of Python 3), that could be a fortuitous moment to integrate the migration of matplotlib as an optional dependency.

Definitely! This is something I will take into account when a new version is released with breaking changes.

As far as I am concerned, for the time being we'll just stick with the status quo and deal with the vulnerability alerts that may come from polyfuzz's dependencies. Your willingness to have had this discussion is appreciated :)

Still, a shame that there isn't a quick fix for this. The only thing I can think of is installing a previous version of Pillow that might not have the vulnerability since PolyFuzz expects a range of versions. However, I would expect previous versions to have the same vulnerability.

I won't close in case you want to keep open for introduction alongside other breaking changes when the time comes; feel free to close!

Definitely! Leaving this open will also help understand what other users think of this issue! Thanks for brining this to the attention of all users.