astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
32.92k stars 1.1k forks source link

new rule - replace d.update({'key': 'value'}) with d['key'] = 'value' #13533

Open spaceby opened 1 month ago

spaceby commented 1 month ago

I suggest adding a rule that detects this

d = {}
d.update({'key': 'value'})

and replacing it

d = {}
d['key'] = 'value'

As a result, the code becomes simpler and more understandable.

zanieb commented 1 month ago

This make sense, but is just a style rule right? Is there a performance diff since you're skipping creating a dictionary?

ugochukwu-850 commented 1 month ago

I like this issue, I am a new her btw [I'm also new to OSC], But I did a little research and its variable , so I am not sure its safe to change based off only on aesthetics . Basically at smaller and mid sized dict updates its ok to just assign but on larger updates creating a secondary dict as update does , would greatly increase the bulk update speed.

While its arguable , since the performance effects is variable maybe it should be optional.

AlexWaygood commented 1 month ago

This make sense, but is just a style rule right? Is there a performance diff since you're skipping creating a dictionary?

Even for small dictionaries, direct assignment is indeed quite a bit faster according to a very naive benchmark:

~/dev % python -m timeit -s "d = {}" "d.update({'key': 'value'})"            
5000000 loops, best of 5: 61.2 nsec per loop
~/dev % python -m timeit -s "d = {}" "d['key'] = 'value'"        
20000000 loops, best of 5: 12.2 nsec per loop

And this is what you'd expect, since the bytecode created for direct assignment is a lot simpler:

~/dev/cpython % ./python.exe
Python 3.14.0a0 (heads/main:986a4e1b6fc, Sep 26 2024, 12:02:18) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def x():
...     d = {}
...     d.update({'key': 'value'})
...     
>>> def y():
...     d = {}
...     d['key'] = 'value'
...     
>>> import dis
>>> dis.dis(x, adaptive=True)
  1           RESUME                   0

  2           BUILD_MAP                0
              STORE_FAST               0 (d)

  3           LOAD_FAST                0 (d)
              LOAD_ATTR                1 (update + NULL|self)
              LOAD_CONST               1 ('key')
              LOAD_CONST               2 ('value')
              BUILD_MAP                1
              CALL                     1
              POP_TOP
              RETURN_CONST             0 (None)
>>> dis.dis(y, adaptive=True)
  1           RESUME                   0

  2           BUILD_MAP                0
              STORE_FAST               0 (d)

  3           LOAD_CONST               1 ('value')
              LOAD_FAST                0 (d)
              LOAD_CONST               2 ('key')
              STORE_SUBSCR
              RETURN_CONST             0 (None)

But as I said, this benchmark is very naive:

@spaceby would your proposed rule only apply to dictionaries with a single item being passed to .update? Or do you also think it should suggest this?

-d.update({
-    "x": 1,
-    "y": 2,
-    "z": 3,
-})
+d["x"] = 1
+d["y"] = 2
+d["z"] = 3
spaceby commented 1 month ago

I think the rule should only apply to a dictionary with single item. A rule similar in meaning to B009 - rewrite the code to something simpler.

AlexWaygood commented 1 month ago

Makes sense... though I think we should explicitly state this in the docs for the rule, or we'll inevitably get a PR "improving" the rule so that it also covers dictionaries with multiple items 😄

autinerd commented 1 month ago

For the update call with multiple entries, maybe the |= operator would be an idea? It seems to be slightly faster :thinking:

❯ python -m timeit -s "d = {}" "d |= {'x': 1,'y': 2,'z': 3}"
2000000 loops, best of 5: 141 nsec per loop
❯ python -m timeit -s "d = {}" "d.update({'x': 1,'y': 2,'z': 3})"
2000000 loops, best of 5: 179 nsec per loop

And of course with single items, |= should also be replaced with directly assigning the value to the key.

AlexWaygood commented 1 month ago

For the update call with multiple entries, maybe the |= operator would be an idea? It seems to be slightly faster 🤔

Possibly, but I think that would be a different rule

MichaReiser commented 1 month ago

This make sense, but is just a style rule right? Is there a performance diff since you're skipping creating a dictionary?

I still think it's worthwhile to frame the motivation properly because, IMO, the performance difference seems too marginal to justify making it a rule.

tdulcet commented 1 month ago

Maybe this would be a separate rule, but perhaps this rule could be generalized to avoid creating unnecessary intermediate representations. For example, these:

l += ["value"]
l += ["value 1", "value 2"]
s |= {"value"}
s |= {"value 1", "value 2"}
d |= {"key": "value"}
d.update({"key": "value"})
# ect.

could potentially be rewritten as:

l.append("value")
l.extend(("value 1", "value 2"))
s.add("value")
s.update(("value 1", "value 2"))
d["key"] = "value"
d["key"] = "value"
# ect.

This was first proposed in https://github.com/astral-sh/ruff/issues/8877#issuecomment-1835998576.

AlexWaygood commented 1 month ago

This make sense, but is just a style rule right? Is there a performance diff since you're skipping creating a dictionary?

I still think it's worthwhile to frame the motivation properly because, IMO, the performance difference seems too marginal to justify making it a rule.

I think the performance difference is significant enough that it could easily make a difference in a hot loop somewhere... but I agree that the motivation is primarily stylistic here, and that this is the kind of performance difference that could easily change as performance improvements are made to CPython itself. So I agree that this should be classified as a stylistic rule.

Maybe this would be a separate rule, but perhaps this rule could be generalized to avoid creating unnecessary intermediate representations. For example, these:

Yes, this makes sense to me; all of these are essentially different manifestations of the same antipattern.

At this point I think I've seen enough support for this that I'd be happy to accept a PR implementing this rule (in the RUF category, I suppose, unless there have been any other linters to already implement this?).