Closed lmores closed 2 weeks ago
Spacy is optional for end users, but for developers I would like it to be required. As soon as it's optional, it's easier for different devs to have different environments, CI to have different environments from dev, tests to accidentally get skipped when they shouldn't, etc.
Have you recently resynced your dependencies using uv sync --all-extras
as shown in the justfile? If you run into issues installing spacy on your dev machine, we can reconsider, and make spacy optional, (similar to how postal is difficult), but I don't think it should be difficult.
If spacy is required for development, shouldn't we add it to the dev-dependencies section in pyproject.toml
?
With spacy installed, all tests are fine except one (but this is probably a functional issue, I am on 6e2dce97a8ae9f7e40006f4eb4d695913ee8b214
)
===================================================================================== FAILURES ======================================================================================
_________________________________________________________________ [doctest] mismo.lib.geo._address.AddressFeatures __________________________________________________________________
054 np.False_
055
056 You can still access the original fields that we didn't normalize:
057
058 >>> features.raw.latitude.execute()
059 np.float64(123.456)
060
061 For use in in preparing data for blocking, you can get all the features as a struct:
062
063 >>> features.as_struct().execute()
Expected:
{'street1': '132 MAIN ST', 'street2': 'APT 3B', 'city': 'SPRINGFIELD', 'state': None, 'postal_code': '12345', 'street_number': '132', 'street_name': 'MAIN', 'street_ngrams': ['132 ', 'MAIN', '32 M', 'AIN ', '2 MA', 'IN S', ' MAI', 'N ST', 'APT ', 'PT 3', 'T 3B', '132 MAIN ST', 'APT 3B'], 'latitude': 123.456}
Got:
{'street1': '132 MAIN ST', 'street2': 'APT 3B', 'city': 'SPRINGFIELD', 'state': None, 'postal_code': '12345', 'taggings': [{'token': '132', 'label': 'AddressNumber'}, {'token': 'MAIN', 'label': 'StreetName'}, {'token': 'ST', 'label': 'StreetNamePostType'}], 'street_number': '132', 'street_number_sorted': '123', 'street_name': 'MAIN', 'street_ngrams': ['MAIN', '123', 'MAIN'], 'latitude': 123.456}
/home/lorenzo/devel/mismo/mismo/lib/geo/_address.py:63: DocTestFailure
============================================================================== short test summary info ==============================================================================
FAILED mismo/lib/geo/_address.py::mismo.lib.geo._address.AddressFeatures
=============================================================== 1 failed, 510 passed, 29 skipped, 7 xfailed in 59.90s ===============================================================
All tests pass on 140698a
.
As spacy is declared as an optional dependency, unit tests that rely on it should be skipped when it is not installed.
Note: when spacy is missing the doctest for these files still fails, but I do not know how to skip them conditionally: