NickCrews / mismo

The SQL/Ibis powered sklearn of record linkage
https://nickcrews.github.io/mismo/
GNU Lesser General Public License v3.0
14 stars 3 forks source link

Skip tests that require spacy when it is not installed #72

Closed lmores closed 2 weeks ago

lmores commented 1 month ago

As spacy is declared as an optional dependency, unit tests that rely on it should be skipped when it is not installed.

Note: when spacy is missing the doctest for these files still fails, but I do not know how to skip them conditionally:

FAILED mismo/lib/geo/_address.py::mismo.lib.geo._address.AddressFeatures
FAILED mismo/lib/geo/_spacy.py::mismo.lib.geo._spacy.spacy_tag_address
NickCrews commented 1 month ago

Spacy is optional for end users, but for developers I would like it to be required. As soon as it's optional, it's easier for different devs to have different environments, CI to have different environments from dev, tests to accidentally get skipped when they shouldn't, etc.

Have you recently resynced your dependencies using uv sync --all-extras as shown in the justfile? If you run into issues installing spacy on your dev machine, we can reconsider, and make spacy optional, (similar to how postal is difficult), but I don't think it should be difficult.

lmores commented 1 month ago

If spacy is required for development, shouldn't we add it to the dev-dependencies section in pyproject.toml?

With spacy installed, all tests are fine except one (but this is probably a functional issue, I am on 6e2dce97a8ae9f7e40006f4eb4d695913ee8b214)


===================================================================================== FAILURES ======================================================================================
_________________________________________________________________ [doctest] mismo.lib.geo._address.AddressFeatures __________________________________________________________________
054     np.False_
055 
056     You can still access the original fields that we didn't normalize:
057 
058     >>> features.raw.latitude.execute()
059     np.float64(123.456)
060 
061     For use in in preparing data for blocking, you can get all the features as a struct:
062 
063     >>> features.as_struct().execute()
Expected:
    {'street1': '132 MAIN ST', 'street2': 'APT 3B', 'city': 'SPRINGFIELD', 'state': None, 'postal_code': '12345', 'street_number': '132', 'street_name': 'MAIN', 'street_ngrams': ['132 ', 'MAIN', '32 M', 'AIN ', '2 MA', 'IN S', ' MAI', 'N ST', 'APT ', 'PT 3', 'T 3B', '132 MAIN ST', 'APT 3B'], 'latitude': 123.456}
Got:
    {'street1': '132 MAIN ST', 'street2': 'APT 3B', 'city': 'SPRINGFIELD', 'state': None, 'postal_code': '12345', 'taggings': [{'token': '132', 'label': 'AddressNumber'}, {'token': 'MAIN', 'label': 'StreetName'}, {'token': 'ST', 'label': 'StreetNamePostType'}], 'street_number': '132', 'street_number_sorted': '123', 'street_name': 'MAIN', 'street_ngrams': ['MAIN', '123', 'MAIN'], 'latitude': 123.456}

/home/lorenzo/devel/mismo/mismo/lib/geo/_address.py:63: DocTestFailure
============================================================================== short test summary info ==============================================================================
FAILED mismo/lib/geo/_address.py::mismo.lib.geo._address.AddressFeatures
=============================================================== 1 failed, 510 passed, 29 skipped, 7 xfailed in 59.90s ===============================================================
lmores commented 2 weeks ago

All tests pass on 140698a.