Sm00thix / IKPLS

Fast CPU and GPU Python implementations of Improved Kernel PLS by Dayal and MacGregor (1997) and Shortcutting Cross-Validation by Engstrøm (2024).
https://ikpls.readthedocs.io/en/latest/
Apache License 2.0
8 stars 3 forks source link

pylint-friendliness & very long lines #9

Closed parmentelat closed 5 months ago

parmentelat commented 5 months ago

Hey there

as part of the review initiated in https://github.com/openjournals/joss-reviews/issues/6533, I am having a first stab at this repo :)

a very first and general comment that I have about the Python code relates to pylint assessment:

given the heavy use of 'math-like' variables, I guess it is acceptable to allow for the invalid-name issues (for example if I do allow that on one source file (ikpls/jax_ikpls_alg_2.py) its score rises from 0/10 to 4.15/10)

so i guess my proposal would be

Sm00thix commented 5 months ago

Thank you for your feedback. I appreciate your points :) I will incorporate your suggestions.

Sm00thix commented 5 months ago

Hi @parmentelat

I have now addressed this issue in a best-effort manner. The changes are made in the latest version of the dev branch. If you agree that this resolves the issue then I will merge the dev and main branch and close the issue.

Executing pylint with --max-line-length 88 --disable invalid-name will now give scores of 8.81/10 for examples/ 9.22/10 for ikpls/ 9.08/10 for tests/

Below I give an overview of the remaining issues raised by pylint.

In examples/, pylint complains that some code is repeated between the different files. While this is true, I do not believe it is an issue. I think it is best for usability if each file in examples/ is standalone and independent of other files.

In tests/, pylint mainly complains about functions (tests) in test_ikpls.py being having too many statements. I do not believe this is an issue as it is mainly due to making the same calls to all the different PLS algorithms to compare their respective outputs. I could probably spend a lot of time on restructuring the tests to reduce the total number of statements. But I do not believe this will improve readability for someone who wants to delve into the tests.

In ikpls/, pylint mainly complains about functions taking too many arguments (more than 5), having too many statements (more than 50), too many locals (more than 15), and too many instance attributes (more than 7). However, these are mainly due to the nature of the implemented algorithms. I could spend time on dividing the functions to sub functions and chain them together. However, I do not believe this will improve readability, and it will certainly be detrimental to execution time due to increased function calls. As fast execution time is the main contribution of this work - especially for the implementation using the fast cross-validation algorithm - I believe dividing the functions to smaller sub functions should not be done. The exception being the JAX implementations that make extensive use of sub functions. However, these functions are JIT-compiled and I suspect the compiler will optimize the function calls away.

Sm00thix commented 5 months ago

I have merged these changes to the main branch. Do you agree that the issue is resolved? @parmentelat After my previous comment, some additional changes were made. Now, executing pylint with --max-line-length 88 --disable invalid-name will now give scores of 8.81/10 for examples/ 9.24/10 for ikpls/ 9.09/10 for tests/

parmentelat commented 5 months ago

great, this is much more legible now, thanks !