Calamari-OCR / calamari

Line based ATR Engine based on OCRopy
Apache License 2.0
1.04k stars 209 forks source link

Dissolve for loop that concatenates strings to str.join #329

Closed boneyag closed 1 year ago

boneyag commented 1 year ago

Replace the for loop on lines 14 and 15 with str.join.

andbue commented 1 year ago

Just out of curiosity (you opened similar PRs in some other repositories): is this created by some automated spam-PR-generator you wrote to increase the amount of code you published on github? If it would actually improve the code and the performance, this would probably be some kind of service to the community. Unfortunately, the suggested change is shorter but slower (at least in my quick test) than the (indeed unpythonic) original solution since it first converts a view object to a generator that is then converted to a list and then concatenated to a string.

boneyag commented 1 year ago

I have created similar PRs to other repositories as I detected similar occurrences in them. PRs are not created using a tool, but rather a manual submission after verifying the detections are not false positives. Before making the PRs, I investigated the issue in Stack Overflow and similar websites. According to those sites, join has better performances (compared to operations that use a loop). My purpose is to figure out if there is a tool that changes X -> Y, would the community benefit from it. In this particular case, it seems not.

andbue commented 1 year ago

Here is my quick test run:

In [1]: testdict = {key: None for key in ["test", "spam", "strings", "dict", "keys", "append"]}

In [2]: %%timeit
   ...: s = ""
   ...: for k in testdict.keys():
   ...:     s += k
438 ns ± 10.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [3]: %%timeit
   ...: s = "".join([k for k in testdict.keys()])

543 ns ± 2.94 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %%timeit
   ...: s = "".join(testdict.keys())
343 ns ± 9.72 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4] would be the best solution here, both in clarity and performance.

Sorry for being a bit harsh in my first reaction – in general, improving code in open source projects is, of course, not a bad idea. Some very tiny optimisations (that might even be detrimental) in hardly ever used parts of the code via pull request, on the other hand, seem to me as a maintainer a bit questionable.

boneyag commented 1 year ago

Hi @andbue, thank you for the clarification. I understand that you cannot simply accept all changes submitted when they do not actually make an improvement.