Safe-DS / Library-Analyzer

Analysis of Python libraries and of code that uses them.
https://library-analyzer.safeds.com
MIT License
5 stars 0 forks source link

Missing constant annotation when the same value is sometimes converted to string #34

Open Aclrian opened 2 years ago

Aclrian commented 2 years ago

URL Hash

#/sklearn/sklearn.preprocessing._data/scale/with_mean

Actual Annotation Type

@optional

Actual Annotation Inputs

{
    "target": "sklearn/sklearn.preprocessing._data/scale/with_mean",
    "authors": [
        "$autogen$"
    ],
    "defaultType": "boolean",
    "defaultValue": true
}

Expected Annotation Type

@constant

Expected Annotation Inputs

with value True (boolean)

Minimal API Data (optional)

Minimal API Data for `sklearn/sklearn.preprocessing._data/scale/with_mean` ```json5 { "schemaVersion": 1, "distribution": "scikit-learn", "package": "sklearn", "version": "1.1.1", "modules": [ { "id": "sklearn/sklearn.preprocessing", "name": "sklearn.preprocessing", "imports": [], "from_imports": [ { "module": "sklearn.preprocessing._data", "declaration": "add_dummy_feature", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "binarize", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "Binarizer", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "KernelCenterer", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "maxabs_scale", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "MaxAbsScaler", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "minmax_scale", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "MinMaxScaler", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "normalize", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "Normalizer", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "power_transform", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "PowerTransformer", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "quantile_transform", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "QuantileTransformer", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "robust_scale", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "RobustScaler", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "scale", "alias": null }, { "module": "sklearn.preprocessing._data", "declaration": "StandardScaler", "alias": null }, { "module": "sklearn.preprocessing._discretization", "declaration": "KBinsDiscretizer", "alias": null }, { "module": "sklearn.preprocessing._encoders", "declaration": "OneHotEncoder", "alias": null }, { "module": "sklearn.preprocessing._encoders", "declaration": "OrdinalEncoder", "alias": null }, { "module": "sklearn.preprocessing._function_transformer", "declaration": "FunctionTransformer", "alias": null }, { "module": "sklearn.preprocessing._label", "declaration": "label_binarize", "alias": null }, { "module": "sklearn.preprocessing._label", "declaration": "LabelBinarizer", "alias": null }, { "module": "sklearn.preprocessing._label", "declaration": "LabelEncoder", "alias": null }, { "module": "sklearn.preprocessing._label", "declaration": "MultiLabelBinarizer", "alias": null }, { "module": "sklearn.preprocessing._polynomial", "declaration": "PolynomialFeatures", "alias": null }, { "module": "sklearn.preprocessing._polynomial", "declaration": "SplineTransformer", "alias": null } ], "classes": [], "functions": [ "sklearn/sklearn.preprocessing._data/scale" ] } ], "classes": [], "functions": [ { "id": "sklearn/sklearn.preprocessing._data/scale", "name": "scale", "qname": "sklearn.preprocessing._data.scale", "decorators": [], "parameters": [ { "id": "sklearn/sklearn.preprocessing._data/scale/with_mean", "name": "with_mean", "qname": "sklearn.preprocessing._data.scale.with_mean", "default_value": "True", "assigned_by": "NAME_ONLY", "is_public": true, "docstring": { "type": "bool, default=True", "description": "If True, center the data before scaling." }, "type": {} } ], "results": [], "is_public": true, "reexported_by": [ "sklearn/sklearn.preprocessing" ], "description": "Standardize a dataset along any axis.\n\nCenter to the mean and component wise scale to unit variance.\n\nRead more in the :ref:`User Guide `.", "docstring": "Standardize a dataset along any axis.\n\n Center to the mean and component wise scale to unit variance.\n\n Read more in the :ref:`User Guide `.\n\n Parameters\n ----------\n X : {array-like, sparse matrix} of shape (n_samples, n_features)\n The data to center and scale.\n\n axis : int, default=0\n axis used to compute the means and standard deviations along. If 0,\n independently standardize each feature, otherwise (if 1) standardize\n each sample.\n\n with_mean : bool, default=True\n If True, center the data before scaling.\n\n with_std : bool, default=True\n If True, scale the data to unit variance (or equivalently,\n unit standard deviation).\n\n copy : bool, default=True\n set to False to perform inplace row normalization and avoid a\n copy (if the input is already a numpy array or a scipy.sparse\n CSC matrix and if axis is 1).\n\n Returns\n -------\n X_tr : {ndarray, sparse matrix} of shape (n_samples, n_features)\n The transformed data.\n\n Notes\n -----\n This implementation will refuse to center scipy.sparse matrices\n since it would make them non-sparse and would potentially crash the\n program with memory exhaustion problems.\n\n Instead the caller is expected to either set explicitly\n `with_mean=False` (in that case, only variance scaling will be\n performed on the features of the CSC matrix) or to call `X.toarray()`\n if he/she expects the materialized dense array to fit in memory.\n\n To avoid memory copy the caller should pass a CSC matrix.\n\n NaNs are treated as missing values: disregarded to compute the statistics,\n and maintained during the data transformation.\n\n We use a biased estimator for the standard deviation, equivalent to\n `numpy.std(x, ddof=0)`. Note that the choice of `ddof` is unlikely to\n affect model performance.\n\n For a comparison of the different scalers, transformers, and normalizers,\n see :ref:`examples/preprocessing/plot_all_scaling.py\n `.\n\n .. warning:: Risk of data leak\n\n Do not use :func:`~sklearn.preprocessing.scale` unless you know\n what you are doing. A common mistake is to apply it to the entire data\n *before* splitting into training and test sets. This will bias the\n model evaluation because information would have leaked from the test\n set to the training set.\n In general, we recommend using\n :class:`~sklearn.preprocessing.StandardScaler` within a\n :ref:`Pipeline ` in order to prevent most risks of data\n leaking: `pipe = make_pipeline(StandardScaler(), LogisticRegression())`.\n\n See Also\n --------\n StandardScaler : Performs scaling to unit variance using the Transformer\n API (e.g. as part of a preprocessing\n :class:`~sklearn.pipeline.Pipeline`).\n\n " } ] } ```

Minimal Usage Store (optional)

Minimal Usage Store for `sklearn/sklearn.preprocessing._data/scale/with_mean` ```json5 { "schemaVersion": 1, "module_counts": { "sklearn/sklearn.preprocessing": 50891 }, "class_counts": {}, "function_counts": { "sklearn/sklearn.preprocessing._data/scale": 314 }, "parameter_counts": { "sklearn/sklearn.preprocessing._data/scale/with_mean": 10 }, "value_counts": { "sklearn/sklearn.preprocessing._data/scale/with_mean": { "True": 311, "'True'": 3 } } } ```

Suggested Solution (optional)

No response

Additional Context (optional)

True: 311 'True': 3

lars-reimann commented 2 years ago

More generally: Invalid values should be filtered out. In this case it works because a non-empty string is considered "truthy". Even passing 'False' would have the same effect as passing True.