Open-MSS / MSS

A QT application, a OGC web map server, a collaboration server to plan atmospheric research flights.
https://open-mss.github.io
Apache License 2.0
56 stars 68 forks source link

builtin checker findings for mslib/plugins/io/csv.py and mslib/utils/time.py #2312

Open ReimarBauer opened 5 months ago

ReimarBauer commented 5 months ago

@matrss seems the builtin checker was updated?

I rather don't want to rename the csv io plugin, the time utils module can maybe refactored renamed to times (just plural?)

@joernu76 What do you think?

I assume we need to solve this in stable.

Successfully installed flake8-7.0.0 flake8-builtins-2.4.0 mccabe-0.7.0 pycodestyle-2.11.1 pyflakes-3.2.0
mslib/plugins/io/csv.py:0:1: A005 the module is shadowing a Python builtin module "csv"
mslib/utils/time.py:0:1: A005 the module is shadowing a Python builtin module "time"
2     A005 the module is shadowing a Python builtin module "csv"
2
matrss commented 5 months ago

@matrss seems the builtin checker was updated?

Yes, there was an update on 2024-03-29 which added this lint rule.

I rather don't want to rename the csv io plugin

What's the reason for that? Is it a public API?

ReimarBauer commented 5 months ago

the csv plugin is used in many users configurations. It is for storing / reading in a different format a flight track.

When we rename this we need to check if we need a migration script.

https://mss.readthedocs.io/en/stable/plugins.html

on default we have import enabled, but export can be enabled by user configuration. Looks like a migration script then or we ignore this one.

ReimarBauer commented 5 months ago

in stable we should ignore both, because a fix is an API change.

matrss commented 5 months ago

OK, so it is part of the public API (I don't think we define what is and isn't, right? Maybe that should be made clear at some point). Honestly, I would be fine with just ignoring those two specific instances of this issue (until they actually become a problem), but leave the lint rule enabled so we don't inadvertently introduce more.

matrss commented 5 months ago

When we rename this we need to check if we need a migration script.

Another thought: a migration script would be the wrong thing to do IMO, since you would have to parse python to do that (and due to pythons dynamic nature you probably still couldn't tell if a program uses this module without executing it with every possible configuration).

A deprecation warning on import and a complete removal in a later release would be more sensible.

ReimarBauer commented 5 months ago

We have to migrate the users entries in the msui_settings.json related to our code changes. json not python. We did such migration already in the past.

But I think we should keep what we currently have on module definitions and wait for a real problem or until the plugin concepts gets refactored. In principle the docking widgets could also be plugins. At the moment there are more contributions for such parts we likly want to have these also pluggable. At this time we would also change the naming scheme for those IO plugins.

It is a good idea to leave the checker enabled to not introduce more conflicts.

ReimarBauer commented 5 months ago

Yes, we should set a deprecation warning, that it will be changed in a later version. At the time we know the version we can adjust it.

It is also a good remark for us.

ReimarBauer commented 4 months ago

looks like we don't have this check in stable, seen that it did not fail on an other PR, so we can fix this in develop

ReimarBauer commented 4 months ago

oh, running that locally

mslib/plugins/io/__init__.py:0:1: A005 the module is shadowing a Python builtin module "io"
mslib/plugins/io/csv.py:0:1: A005 the module is shadowing a Python builtin module "csv"
mslib/utils/time.py:0:1: A005 the module is shadowing a Python builtin module "time"
ReimarBauer commented 4 months ago

hmm, it is not good that we named that io, but since when is the io library a builtin?

ReimarBauer commented 4 months ago

We don't touch io plugins for this release but refactor time. https://github.com/Open-MSS/MSS/issues/2315

I don't think we need a deprecation warning, but we can do in stable. The new name is time_handling. I will later add a HINT in CHANGES. There is one occurancy in wms.py.

matrss commented 4 months ago

hmm, it is not good that we named that io, but since when is the io library a builtin?

It exists in the standard library at least since 2.6.

ReimarBauer commented 4 months ago

Yes but not Listed as builtins in the docs but Standard library

ReimarBauer commented 4 months ago

https://docs.python.org/3/reference/datamodel.html#built-in-functions A built-in function object is a wrapper around a C function. Examples of built-in functions are len() and math.sin() (math is a standard built-in module).

https://docs.python.org/3/library/functions.html#built-in-functions shadowing those is a major problem

For all the others which we have in our stack I am also interested in finding name conflicts. The function currently shows additionaly only conflicts to stdlib for our source. Any to Any would also be interesting.

matrss commented 4 months ago

I am pretty sure that this new lint rule in flake8-builtins considers "built-in modules" to just be modules that are part of the standard library, and does not differentiate between if they are pure python or a wrapper around C.

joernu76 commented 4 months ago

Mmmh. I do no think that we should avoid any modules having the same name as standard library ones if it is somewhere on a sublevel. We shouldn't import these in a way that would shadow the standard library ones, though. This here, e.g. is safe: from mslib.plugins.io.csv import load_from_csv, save_to_csv and I strongly disagree with the lint nagging.

At some point in time there will be no simple names left due to the ever more comprehensive standard libraries.

I.e. we should avoid import mslib.plugins.io.csv as csv but not the usage above.

joernu76 commented 4 months ago

E.g. the airdata.py module uses import time, whereby our time module is in the same directory. this import uses the standard library time module as intended. To use our time module, one would need to import mslib.utils.time.

What is the error scenario, where our time module would really shadow the standard library one?

joernu76 commented 4 months ago

I suggest disabling this check in our flake8-checker.

joernu76 commented 4 months ago

This is also, incidentally, the kind of issue, why I would like a more stable test environment.

matrss commented 4 months ago

I think I agree with @joernu76 on this now, the issue isn't the module name, but if it is used in a way in which it does shadow builtins. ~Not sure if we have any of those, the linter doesn't search for them~. EDIT: actually, that is precisely why we introduced flake8-builtins and we indeed have eliminated all of those occurrences (apart from a few which were specifically ignored, like id fields in database models).

This is also, incidentally, the kind of issue, why I would like a more stable test environment.

2298 would fix this (when the flake8 workflow is changed to use pixi as well), i.e. with the changes from there this kind of issue would be isolated to the lockfile update PR, instead of just randomly appearing on develop.

matrss commented 4 months ago

E.g. the airdata.py module uses import time, whereby our time module is in the same directory. this import uses the standard library time module as intended. To use our time module, one would need to import mslib.utils.time.

What is the error scenario, where our time module would really shadow the standard library one?

I did look into this again to understand why the lint rule was introduced in the first place and there is indeed a reason for it:

Take the following simplified module structure:

.
`-- package
    |-- __init__.py
    |-- module1
    |   |-- __init__.py
    |   |-- csv.py
    |   |-- direct_import.py
    |   |-- indirect_import.py
    |   `-- time.py
    `-- module2
        |-- __init__.py
        `-- some_module.py

with the following file contents:

$ cat package/module1/time.py 
print("package/module1/time.py imported")
$ cat package/module1/csv.py 
print("package/module1/csv.py imported")
$ cat package/module1/direct_import.py 
import time
import csv

print("package/module1/direct_import.py imported")
$ cat package/module1/indirect_import.py 
import package.module2.some_module

print("package/module1/indirect_import.py imported")
$ cat package/module2/some_module.py 
import csv
import time

print("package/module2/some_module.py imported")

(the __init__.py are empty)

PYTHONPATH is set to $PWD.


If you now import package.module1.direct_import in python everything is fine:

$ python3 -c 'import package.module1.direct_import'
package/module1/direct_import.py imported

But if you run package/module1/direct_import.py directly csv will be shadowed:

$ python3 package/module1/direct_import.py 
package/module1/csv.py imported
package/module1/direct_import.py imported

This also happens for indirect imports, i.e. when another module that is imported requires csv:

$ python3 package/module1/indirect_import.py 
package/module1/csv.py imported
package/module2/some_module.py imported
package/module1/indirect_import.py imported

(I think this would even affect a third-party package importing csv, when only that third-party package was used in indirect_import.py)


The reason for this is that the directory of the .py file is implicitly added to the python search path if it is directly executed. Python will therefore find the csv in package/module1/ first and take it from there. When importing the module this doesn't happen, since the python interpreter is started from the top-level directory and therefore only this top-level directory is implicitly added to the search path, but not the package/module1/ sub-directory.

You might now wonder why this only happens for csv, but not for time. AFAICT, the reason is this: time is directly compiled into the python interpreter (at least with the version I am using here). It is part of this tuple: https://docs.python.org/3/library/sys.html#sys.builtin_module_names With my current python version this is the full list of modules directly included in the interpreter:

('_abc', '_ast', '_codecs', '_collections', '_functools', '_imp', '_io', '_locale', '_operator', '_signal', '_sre', '_stat', '_string', '_symtable', '_thread', '_tokenize', '_tracemalloc', '_warnings', '_weakref', 'atexit', 'builtins', 'errno', 'faulthandler', 'gc', 'itertools', 'marshal', 'posix', 'pwd', 'sys', 'time', 'xxsubtype')

Notably, csv is not in there. csv is only part of the (much larger) set of stdlib names: https://docs.python.org/3/library/sys.html#sys.stdlib_module_names

So, AFAICT, builtin -- as in compiled as part of the interpreter -- modules can't be shadowed, but other standard library modules can (there is also the concept of "frozen" modules, I am not sure if those also affect how shadowing works).


Does this affect us? No, at the moment it doesn't. Could it? Yes, definitely, because we directly call python files in development (something that we could replace with using an editable installation of MSS, making e.g. msui call the development version, but that is a different story).

As an example: mslib/utils/mssautoplot.py is a directly callable script. Whenever it is called, mslib/utils/ will be part of the python search path. If it wasn't for the distinction between "normal" stdlib and builtin, then mslib/utils/time.py would shadow the builtin time module, and every import time anywhere in this instance of the python interpreter would suddenly import mslib/utils/time.py. So while this isn't an issue with time.py right now, if we ever introduced a mslib/utils/csv.py this will become an issue.

Likewise, we have e.g. mslib/msui/msui.py (and others for mscolab, mswms, etc.). If there was any module in that directory that was named the same as a standard library module then suddenly that module would be picked up instead. Thankfully, there are none in there.


Considering this, I think we should re-enable the lint rule. As long as we directly call python files inside mslib/ this lint rule does make sense. For now I would just ignore specifically the two instances of this issue that we have, but I would prefer not to introduce more of these.

What do you think?