PyCQA / bandit

Bandit is a tool designed to find common security issues in Python code.
https://bandit.readthedocs.io
Apache License 2.0
6.37k stars 603 forks source link

lxml guidance is not useful #767

Open mwichmann opened 2 years ago

mwichmann commented 2 years ago

Describe the bug

Just so this is recorded somewhere, it's certainly not a show-stopper bug:

If your program uses lxml, it's going to get warnings like this:

>> Issue: [B410:blacklist] Using lxml to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml with the equivalent defusedxml package.

Except that the defusedxml.lxml package was never really real, it was intended as an example, and to stop people from using it as-is it's now deprecated and planned for removal. Probably bandit should not be suggesting that? The problem is - what else would one suggest?

See: https://pypi.org/project/defusedxml/#defusedxml-lxml

Reproduction steps

Run bandit on something that uses lxml.

Expected behavior

Expect: "a useful suggestion". Possibly there's no good suggestion to make here?

Bandit version

1.7.0 (Default)

Python version

3.9 (Default)

Additional context

No response

jpodivin commented 2 years ago

I suppose lxml could have been brought into compliance in the meantime? Have you asked the maintainer ? https://github.com/scoder

djbrown commented 2 years ago

435 was falsely closed as invalid some time ago..

defusedxml.lxml is deprecated. bandit should make smarter checks or at least stop suggesting defusedxml.lxml

AFAIK lxml actually provides secure parsing by now, users just have to apply it correctly.

djbrown commented 2 years ago

also #261 was closed as completed - but it isn't and #716 seems to be a more specific duplicate of this one. I suggest discussing in this issue, since it's about lxml as a whole.. @ericwb can you confirm this is a valid issue after all?

djbrown commented 2 years ago

Security assessment of lxml as of https://github.com/tiran/defusedxml#python-xml-libraries:

  • Lxml is protected against billion laughs attacks and doesn't do network lookups by default.
  • libxml2 and lxml are not directly vulnerable to gzip decompression bombs but they don't protect you against them either.
  • Library has (limited) XInclude support but requires an additional step to process inclusion.

Bandit could check for insecure use of lxml and suggest using lxml safely

djbrown commented 11 months ago

Recently defusedxml got an update on safety information for lxml:

defusedxml.lxml

DEPRECATED The module is deprecated and will be removed in a future release.

lxml is safe against most attack scenarios. lxml uses libxml2 for parsing XML. The library has builtin mitigations against billion laughs and quadratic blowup attacks. The parser allows a limit amount of entity expansions, then fails. lxml also disables network access by default. libxml2 lxml FAQ lists additional recommendations for safe parsing, for example counter measures against compression bombs.

see https://github.com/tiran/defusedxml/issues/38 and https://github.com/tiran/defusedxml/pull/98 I'm thinking about creating a pull request, removing hints to defusedxml.lxml from bandit. @tiran would you endorse that?

nicostubi commented 2 months ago

Hello @djbrown,

Indeed, lxml has been deprecated within the package defusedxml. It would be useful to remove references to defusedxml.xml and give some guidance. There is still extra care to make when using lxml, isn't it?

Both lxml and defusedxml do a good job at describing precautionary steps to put in place, e.g.

So, Bandit could simply reference these documentations.

Your thoughts?

BR Nicolas