Add scikit-fingerprints

Hrovatin commented 2 months ago

Replace mordred and rdkit fingerprints with scikit-fingerpints and enable other fingerprints from the package. Aim to remove rdkit and mordred install.

Dev in: https://github.com/Hrovatin/baybe/tree/feature/scikit_fingerprints

Notes/Discuss:

[x] Functions that use RDKit but are not fingerprint related - do we keep RDKit then?
- is_valid_smiles: not used anywhere
- get_canonical_smiles
[x] New automatic fingerprint naming will not be backward-compatible
[x] mordred check in edbo - can this be used for any fingeprint (before was mordred and rdkit)
[x] Consider making Fingerprint enum a class to make code prettier (see TODOs in enum code) - EDIT: Not relevant anymore

Scienfitz commented 2 months ago

thank you for taking this on, summarizing our earlier conversation

[x] Update the optional dependency group chem in pyproject.toml
[x] Update _optional/info.py and _optional/chem.py, in particular CHEM_INSTALLED
[x] Remove the smiles_to_*_features in baybe/utils/chemistry.py. You probably can replace them with a single function. It would likely also be possible without any such function and doing the logic irectly in the substance parameter, however we group all chemistry logic into this file so it can be lazily imported so I guess the best is to have one new utility function here
[x] Update the core logic and attributes/validators interfacing users via the SubstanceParameter in baybe/parameters/substance.py
[x] Automatically generate the enum SubstanceEncoding with all available choices in scikit-fingerprints. The enum likely has to be moved from baybe/parameters/enum.py to baybe/parameters/substance.py so the lazy import is still done
[x] Replace all usages of the old encodings as strings or enums with the new ones
[x] It appears tests are already made generalistic and dont need to be updated, but double check this
[x] Update the userguide in docs/userguide/parameters.md with the new choices for the encoding
[x] Update and/or retest examples, in particular examples/Backtesting/full_lookup.py
[x] Add yourself to CONTRIBUTORS.md
[x] Mention this change in the CHANGELOG
[x] Double check and expand (if needed) the hypothesis strategies for substance parameters in tests/hypothesis_strategies/parameters.py

Scienfitz commented 2 months ago

its strange that is_valid_smiles is not used soemwhere, we definitley used to valdiate SMILES at some point

but in here I see that the value corresponding to SMILES are validated with a different logic in @data.validator and not using is_valid_smiles @AdrianSosic any idea why?

wouldnt value_validator=is_valid_smiles make the most sense? Or was there an issue with lazy loading?

AdrianSosic commented 2 months ago

Was refactored at some point to handle smiles in canonical form, which also does the check internally (see validator method):

But the other function was kept because it's still useful in its own right.

Hrovatin commented 2 months ago

@Scienfitz I ran pytest -fast and there are two errs that I am not sure about - if you could provide some guidance that would be great

FAILED tests/test_searchspace.py::test_searchspace_memory_estimate[grid5-parameter_names0] - AssertionError: ('Comp: ', 699840, 563760)
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-AtomPairFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...

One more question about the test - do I need to run them separately in env where CHEM is not installed?

Hrovatin commented 2 months ago

Also, do I need to do sth to re-generate documentation svgs or will it be done automatically? I guess examples/Backtesting/full_lookup.py creates some of these - should I run it?

Scienfitz commented 2 months ago

please can you use tox -e fulltest-py310 tox -e coretest-py312 (and also tox -e lint-py312 tox -e mypy-py312 for other tests)

will probably give you the same error but to exclude that its any environment misconfiguration

I have a suspicion for the first error, but impossible to help without seeing the code. You can already open the PR in draft mode

Scienfitz commented 2 months ago

dont care about the pictures at this moment, they actually shouldnt change much if the fingerprints from the package are implemented identically

AVHopp commented 2 months ago

Regarding pictures: Once everything else is fixed, just ping me about the pictures @Hrovatin . I can then give you a heads-up/we can discuss how to update pictures, but as Martin says, this is not really relevant at the moment.

Hrovatin commented 2 months ago

Test results. For mypy I need to do a few updates and will add once finished.

tox -p -e lint-py312
  lint-py312: OK (19.49=setup[1.84]+cmd[0.01,17.63] seconds)
  congratulations :) (19.81 seconds)

tox -p -e coretest-py312
  coretest-py312: OK (269.43=setup[72.26]+cmd[0.01,197.16] seconds)
  congratulations :) (269.77 seconds)

tox -p -e fulltest-py310
=================================================================================================== short test summary info ====================================================================================================
FAILED tests/docs/test_examples.py::test_example[examples/Serialization/basic_serialization.py] - subprocess.CalledProcessError: Command '['python', 'examples/Serialization/basic_serialization.py']' returned non-zero exit status 1.
FAILED tests/test_iterations.py::test_kernels[b3-grid5-i3-AdditiveKernel3] - torch._C._LinAlgError: linalg.eigh: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated eigenvalues (error code: 2).
FAILED tests/test_searchspace.py::test_searchspace_memory_estimate[grid5-parameter_names0] - AssertionError: ('Comp: ', 699840, 563760)
FAILED tests/test_searchspace.py::test_searchspace_memory_estimate[grid8-parameter_names0] - AssertionError: ('Comp: ', 1119744, 902016)
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-AtomPairFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-AutocorrFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-AvalonFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-E3FPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-ECFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-ERGFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-EStateFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-FunctionalGroupsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-GETAWAYFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-GhoseCrippenFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-KlekotaRothFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-LaggnerFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-LayeredFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-LingoFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MACCSFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MAPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MHFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MORSEFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MQNsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MordredFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-PatternFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-PharmacophoreFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-PhysiochemicalPropertiesFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-PubChemFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-RDFFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-RDKit2DDescriptorsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-RDKitFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-SECFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-TopologicalTorsionFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-USRCATFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-USRFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-WHIMFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-DefaultFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-AtomPairFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-AutocorrFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-AvalonFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-E3FPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-ECFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-ERGFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-EStateFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-FunctionalGroupsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-GETAWAYFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-GhoseCrippenFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-KlekotaRothFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-LaggnerFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-LayeredFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-LingoFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MACCSFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MAPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MHFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MORSEFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MQNsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MordredFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-PatternFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-PharmacophoreFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-PhysiochemicalPropertiesFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-PubChemFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-RDFFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-RDKit2DDescriptorsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-RDKitFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-SECFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-TopologicalTorsionFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-USRCATFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-USRFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-WHIMFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-DefaultFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
==================================================================================== 70 failed, 1561 passed, 4 skipped in 405.38s (0:06:45) ====================================================================================
fulltest-py310: exit 1 (410.36 seconds) /Users/karinhrovatin/Documents/code/baybe-Hrovatin> pytest -p no:warnings --cov=baybe --durations=5 pid=77919
  fulltest-py310: FAIL code 1 (413.63=setup[3.27]+cmd[0.00,410.36] seconds)
  evaluation failed :( (414.00 seconds)

Hrovatin commented 2 months ago

For mypy I have multiple issues with SubstanceEncoding, for which I would anyway suggest changes, as briefly mentioned above. So I did not resolve them for now.

baybe/parameters/enum.py:51: error: Unexpected keyword argument "names" for "ParameterEncoding"  [call-arg]
baybe/parameters/substance.py:60: error: Variable "baybe.parameters.enum.SubstanceEncoding" is not valid as a type  [valid-type]
baybe/parameters/substance.py:60: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
baybe/parameters/substance.py:60: error: No overload variant of "field" matches argument types "Any", "ParameterEncoding"  [call-overload]
baybe/parameters/substance.py:60: note: Possible overload variants:
baybe/parameters/substance.py:60: note:     def field(*, default: None = ..., validator: None = ..., repr: bool | Callable[[Any], str] = ..., hash: bool | None = ..., init: bool = ..., metadata: Mapping[Any, Any] | None = ..., converter: None = ..., factory: None = ..., kw_only: bool = ..., eq: bool | None = ..., order: bool | None = ..., on_setattr: Callable[[Any, Attribute[Any], Any], Any] | list[Callable[[Any, Attribute[Any], Any], Any]] | _NoOpType | None = ..., alias: str | None = ..., type: type | None = ...) -> Any
baybe/parameters/substance.py:60: note:     def [_T] field(*, default: None = ..., validator: Callable[[Any, Attribute[_T], _T], Any] | Sequence[Callable[[Any, Attribute[_T], _T], Any]] | None = ..., repr: bool | Callable[[Any], str] = ..., hash: bool | None = ..., init: bool = ..., metadata: Mapping[Any, Any] | None = ..., converter: Callable[[Any], Any] | Converter[Any, _T] | None = ..., factory: Callable[[], _T] | None = ..., kw_only: bool = ..., eq: bool | Callable[[Any], Any] | None = ..., order: bool | Callable[[Any], Any] | None = ..., on_setattr: Callable[[Any, Attribute[Any], Any], Any] | list[Callable[[Any, Attribute[Any], Any], Any]] | _NoOpType | None = ..., alias: str | None = ..., type: type | None = ...) -> _T
baybe/parameters/substance.py:60: note:     def [_T] field(*, default: _T, validator: Callable[[Any, Attribute[_T], _T], Any] | Sequence[Callable[[Any, Attribute[_T], _T], Any]] | None = ..., repr: bool | Callable[[Any], str] = ..., hash: bool | None = ..., init: bool = ..., metadata: Mapping[Any, Any] | None = ..., converter: Callable[[Any], Any] | Converter[Any, _T] | None = ..., factory: Callable[[], _T] | None = ..., kw_only: bool = ..., eq: bool | Callable[[Any], Any] | None = ..., order: bool | Callable[[Any], Any] | None = ..., on_setattr: Callable[[Any, Attribute[Any], Any], Any] | list[Callable[[Any, Attribute[Any], Any], Any]] | _NoOpType | None = ..., alias: str | None = ..., type: type | None = ...) -> _T
baybe/parameters/substance.py:60: note:     def [_T] field(*, default: _T | None = ..., validator: Callable[[Any, Attribute[_T], _T], Any] | Sequence[Callable[[Any, Attribute[_T], _T], Any]] | None = ..., repr: bool | Callable[[Any], str] = ..., hash: bool | None = ..., init: bool = ..., metadata: Mapping[Any, Any] | None = ..., converter: Callable[[Any], Any] | Converter[Any, _T] | None = ..., factory: Callable[[], _T] | None = ..., kw_only: bool = ..., eq: bool | Callable[[Any], Any] | None = ..., order: bool | Callable[[Any], Any] | None = ..., on_setattr: Callable[[Any, Attribute[Any], Any], Any] | list[Callable[[Any, Attribute[Any], Any], Any]] | _NoOpType | None = ..., alias: str | None = ..., type: type | None = ...) -> Any
baybe/parameters/substance.py:61: error: "ParameterEncoding" has no attribute "DefaultFingerprint"  [attr-defined]
baybe/parameters/substance.py:61: error: Unsupported converter, only named functions, types and lambdas are currently supported  [misc]
baybe/parameters/substance.py:123: error: SubstanceEncoding? has no attribute "name"  [attr-defined]
Found 6 errors in 2 files (checked 102 source files)
mypy-py312: exit 1 (2.70 seconds) /Users/karinhrovatin/Documents/code/baybe-Hrovatin> mypy pid=82438
  mypy-py312: FAIL code 1 (5.70=setup[2.99]+cmd[0.01,2.70] seconds)
  evaluation failed :( (5.99 seconds)

Scienfitz commented 1 month ago

Functions that use RDKit but are not fingerprint related - do we keep RDKit then?

I think rdkit is a main dep of skfp so we do not have to decide and can keep all other funcs

New automatic fingerprint naming will not be backward-compatible

Is ideally designed to coincide with the namign scheme ie dropping capitalization and Fingerprints. might need an alias/deprecation for the morgan one

mordred check in edbo - can this be used for any fingeprint (before was mordred and rdkit)

Yes for now

Consider making Fingerprint enum a class to make code prettier (see TODOs in enum code)

not sure what you mean but Adrian raised the one point: If we generate the enums automatically, would that destroy the tab completion when I type SubstanceEncoding.<TAB>? Can you check? If so we should not generate the encoding automatically int his PR and leave it for a potential upcoming solution.

Scienfitz commented 1 month ago

Regarding Errors

AssertionError: ('Comp: ', 699840, 563760)

I suspect the missing dtype cast messes with the size estimation vs actual size in the memory test. E.g. if a fingeprrint returns some of their columns as int the estimation that these are all floats32 doesnt hold anymore

The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated eigenvalues (error code: 2). (and all other errors re numericals like decomposion, ill defined matrix etc)

Ingore, they appear 40% of the time at random

baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...

No clear idea. Seems like the overall contruction of the parameter computational representation comp_df is not correct. Did you look at some of those (and compare eg with the one you get from a non substance parameter) ?

emdgroup / baybe

Add scikit-fingerprints #359