Safe-DS / Library-Analyzer

Analysis of Python libraries and of code that uses them.
https://library-analyzer.safeds.com
MIT License
5 stars 0 forks source link

Default value 2**20 is not recognized, wrongly annotated as required #26

Open jofaul opened 2 years ago

jofaul commented 2 years ago

URL Hash

#/sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features

Actual Annotation Type

@required

Actual Annotation Inputs

{
    "target": "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features",
    "authors": [
        "$autogen$"
    ]
}

Expected Annotation Type

@optional

Expected Annotation Inputs

2**20

Minimal API Data (optional)

Minimal API Data for `sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features` ```json5 { "schemaVersion": 1, "distribution": "scikit-learn", "package": "sklearn", "version": "1.1.1", "modules": [ { "id": "sklearn/sklearn.feature_extraction", "name": "sklearn.feature_extraction", "imports": [], "from_imports": [ { "module": "sklearn.feature_extraction", "declaration": "text", "alias": null }, { "module": "sklearn.feature_extraction._dict_vectorizer", "declaration": "DictVectorizer", "alias": null }, { "module": "sklearn.feature_extraction._hash", "declaration": "FeatureHasher", "alias": null }, { "module": "sklearn.feature_extraction.image", "declaration": "grid_to_graph", "alias": null }, { "module": "sklearn.feature_extraction.image", "declaration": "img_to_graph", "alias": null } ], "classes": [ "sklearn/sklearn.feature_extraction._hash/FeatureHasher" ], "functions": [] } ], "classes": [ { "id": "sklearn/sklearn.feature_extraction._hash/FeatureHasher", "name": "FeatureHasher", "qname": "sklearn.feature_extraction._hash.FeatureHasher", "decorators": [], "superclasses": [ "TransformerMixin", "BaseEstimator" ], "methods": [ "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__" ], "is_public": true, "reexported_by": [ "sklearn/sklearn.feature_extraction" ], "description": "Implements feature hashing, aka the hashing trick.\n\nThis class turns sequences of symbolic feature names (strings) into\nscipy.sparse matrices, using a hash function to compute the matrix column\ncorresponding to a name. The hash function employed is the signed 32-bit\nversion of Murmurhash3.\n\nFeature names of type byte string are used as-is. Unicode strings are\nconverted to UTF-8 first, but no Unicode normalization is done.\nFeature values must be (finite) numbers.\n\nThis class is a low-memory alternative to DictVectorizer and\nCountVectorizer, intended for large-scale (online) learning and situations\nwhere memory is tight, e.g. when running prediction code on embedded\ndevices.\n\nRead more in the :ref:`User Guide `.\n\n.. versionadded:: 0.13", "docstring": "Implements feature hashing, aka the hashing trick.\n\n This class turns sequences of symbolic feature names (strings) into\n scipy.sparse matrices, using a hash function to compute the matrix column\n corresponding to a name. The hash function employed is the signed 32-bit\n version of Murmurhash3.\n\n Feature names of type byte string are used as-is. Unicode strings are\n converted to UTF-8 first, but no Unicode normalization is done.\n Feature values must be (finite) numbers.\n\n This class is a low-memory alternative to DictVectorizer and\n CountVectorizer, intended for large-scale (online) learning and situations\n where memory is tight, e.g. when running prediction code on embedded\n devices.\n\n Read more in the :ref:`User Guide `.\n\n .. versionadded:: 0.13\n\n Parameters\n ----------\n n_features : int, default=2**20\n The number of features (columns) in the output matrices. Small numbers\n of features are likely to cause hash collisions, but large numbers\n will cause larger coefficient dimensions in linear learners.\n input_type : str, default='dict'\n Choose a string from {'dict', 'pair', 'string'}.\n Either \"dict\" (the default) to accept dictionaries over\n (feature_name, value); \"pair\" to accept pairs of (feature_name, value);\n or \"string\" to accept single strings.\n feature_name should be a string, while value should be a number.\n In the case of \"string\", a value of 1 is implied.\n The feature_name is hashed to find the appropriate column for the\n feature. The value's sign might be flipped in the output (but see\n non_negative, below).\n dtype : numpy dtype, default=np.float64\n The type of feature values. Passed to scipy.sparse matrix constructors\n as the dtype argument. Do not set this to bool, np.boolean or any\n unsigned integer type.\n alternate_sign : bool, default=True\n When True, an alternating sign is added to the features as to\n approximately conserve the inner product in the hashed space even for\n small n_features. This approach is similar to sparse random projection.\n\n .. versionchanged:: 0.19\n ``alternate_sign`` replaces the now deprecated ``non_negative``\n parameter.\n\n See Also\n --------\n DictVectorizer : Vectorizes string-valued features using a hash table.\n sklearn.preprocessing.OneHotEncoder : Handles nominal/categorical features.\n\n Examples\n --------\n >>> from sklearn.feature_extraction import FeatureHasher\n >>> h = FeatureHasher(n_features=10)\n >>> D = [{'dog': 1, 'cat':2, 'elephant':4},{'dog': 2, 'run': 5}]\n >>> f = h.transform(D)\n >>> f.toarray()\n array([[ 0., 0., -4., -1., 0., 0., 0., 0., 0., 2.],\n [ 0., 0., 0., -2., -5., 0., 0., 0., 0., 0.]])\n " } ], "functions": [ { "id": "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__", "name": "__init__", "qname": "sklearn.feature_extraction._hash.FeatureHasher.__init__", "decorators": [], "parameters": [ { "id": "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features", "name": "n_features", "qname": "sklearn.feature_extraction._hash.FeatureHasher.__init__.n_features", "default_value": "2**20", "assigned_by": "POSITION_OR_NAME", "is_public": true, "docstring": { "type": "int, default=2**20", "description": "The number of features (columns) in the output matrices. Small numbers\nof features are likely to cause hash collisions, but large numbers\nwill cause larger coefficient dimensions in linear learners." }, "type": {} } ], "results": [], "is_public": true, "reexported_by": [], "description": "", "docstring": "" } ] } ```

Minimal Usage Store (optional)

Minimal Usage Store for `sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features` ```json5 { "schemaVersion": 1, "module_counts": { "sklearn/sklearn.feature_extraction": 693 }, "class_counts": { "sklearn/sklearn.feature_extraction._hash/FeatureHasher": 26 }, "function_counts": { "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__": 22 }, "parameter_counts": { "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features": 11 }, "value_counts": { "sklearn/sklearn.feature_extraction._hash/FeatureHasher/__init__/n_features": { "6": 1, "8": 2, "12": 2, "20": 1, "100": 1, "hash_vector_size": 2, "m": 1, "2**18": 1, "2**20": 11 } } } ```

Suggested Solution (optional)

No response

Additional Context (optional)

image
Aclrian commented 2 years ago

It should also work for boundaries. See closed Safe-DS/API-Editor#869 for details