Open eddiebergman opened 8 months ago
@LennartPurucker Please let me know what you think of this when you have time. It's come up a few times now
I am actually not sure about the first case either:
# Here the `None` indicates the absence of a value and some default directory/name is fine.
def f(working_dir: Path | None = None):
"""Uses timestamp if left as None"""
...
If this is a private function, None
is good. But if this is a public function, then I would also set this to "auto" or the option "timestamp". Right now, from the function definition, while using the function in an editor, I could not tell what happens by default without reading the docstring.
Sidenote: IMO, the typical behavior of defaulting to timestamp from a None
input is very unintuitive sometimes and actually not fully thread/multi-processing safe! Thus, I would rather know about this as a user. If the function prints the paths anyway, an additional warning might not be needed.
Otherwise, this is a very good thing. Also, I am not yet fully sure about the code overhead of custom warnings. I guess custom warnings are better code quality and allow for catching/ignoring them, right?
I am actually not sure about the first case either:
# Here the `None` indicates the absence of a value and some default directory/name is fine. def f(working_dir: Path | None = None): """Uses timestamp if left as None""" ...
If this is a private function,
None
is good. But if this is a public function, then I would also set this to "auto" or the option "timestamp". Right now, from the function definition, while using the function in an editor, I could not tell what happens by default without reading the docstring.Sidenote: IMO, the typical behavior of defaulting to timestamp from a
None
input is very unintuitive sometimes and actually not fully thread/multi-processing safe! Thus, I would rather know about this as a user. If the function prints the paths anyway, an additional warning might not be needed.
I agree in principle but as a matter of ergonomics, I would still call this None
. I can't articulate why (which is why there is this issue, trying to figure it out) but it's something along the lines of a default parameter can be provided which works well in most cases, which otherwise would require a user to explicitly provide one, or the function would not work.
In contrast to the other example, we are using a hueristic to make a choice amongst a discrete set, which could otherwise cause a lot of hidden behavior to occur. Another example being the threadpoolctl heuristic.
The question is where to draw a line. An optional parameter where the default parameter value is None
is different than an auto parameter.
I guess I would say that an optional with None
as the default will be set to some default which is independant of the system you are on and the other parameters, i.e. it will always be a timestamped path. Where as an auto will apply some hueristic depending on the other parameters or the system in which the program is being run.
Objections, comments, feedback?
Otherwise, this is a very good thing. Also, I am not yet fully sure about the code overhead of custom warnings. I guess custom warnings are better code quality and allow for catching/ignoring them, right?
I would consider it low overhead on the development side and yes, it allows for distuinguishing them more easily. Anything that wants to interact with that warning can do so by type rather than by content. This allows us to for example, change the wording in the warning without breaking any system that relies on the type for functionality.
This also applies to errors:
# Needed to make sure a certain exception occurs
# Will break if someone decides to make it a proper sentence and capitalize "Blub"
with pytest.raises(ValueError, match="blub should be provided"):
...
# Only breaks if the underlying exception type changes
with pytest.raises(BlueError):
...
You can see this in effect in one of the doc hooks, which disables warnings from the running examples in the doc strings: https://github.com/automl/amltk/blob/0af8f7fe77cebb87e5524ad1934de22a74f0b338/docs/hooks/cleanup_log_output.py#L21-L31
From our in-person discussion:
Cases:
So warnings can be done at Runtime and I don't want to automate this in anyway with concrete types... too much overhead. However I would assume most people would have access to some sort of LSP that will give them the type definitions. We can inject this information there at the very least.
But I thought at least from type definitions we can be a bit more explicit, for example:
It should have a near-zero runtime impact and is mainly for type checking, you can see a few variants here:
from __future__ import annotations
from pathlib import Path
from typing import TypeAlias, TypeVar
T = TypeVar("T")
# Basically says these types are either given a type `T` or are `None`
# The reason for `| None` is so that setting the default value to `None` works correctly
# in signatures
Auto: TypeAlias = type[T] | None
Default: TypeAlias = type[T] | None
# Usage 1
def f1(display: Auto[bool] = None) -> None:
if display is None:
# Make some automatic choice
...
# Or this? _Auto would be a singleton defined once
_Auto = None
def f2(display: Auto[bool] = _Auto) -> None:
if display is _Auto:
# Make some automatic choice
...
# Same works for default
def h(path: Default[Path] = None) -> None:
if path is None:
# Create some default path
...
I'll be honest, I feel all of this will just make things more complicated and confusing. I would rather in the end just use the Path | None = None
idiom.
Maybe could do the following to keep it as simple as possible
Auto = None
def f(p: Path | None = Auto):
pass
The LSP still gives enough info and it doesn't deref to be None
. (At least for pyright
which is the core of vscode's pylance
, so they should have similar type information displayed)
As things have developed, there's been a few instances where we can set some default in an automatic fashion, however it may not be clear to the user that we are making a hueristic choice, and it would be better if they specified this.
While it was not declared anywhere, I think it makes sense to make some rule set that essentially:
AutomaticParameterWarning
introduced in #244.This should remain an issue until:
UserWarning
and automatic parameters in AMLTK have been unifiedUserWarning
derived from the use of"auto"
should be replaced be a unique warning type.f(x: T | None = None)
that make heuristic choices should be converted to usef(x: T | Literal["auto"] = "auto")
. It's important to distinguish the absence of a value with the activation of an automatic heuristic. It might be good to give examples here: