HLasse / TextDescriptives

A Python library for calculating a large variety of metrics from text
https://hlasse.github.io/TextDescriptives/
Apache License 2.0
313 stars 23 forks source link

Quick start not working as expected #184

Closed RichardLitt closed 1 year ago

RichardLitt commented 1 year ago

For the given file, filled in from the Quickstart, I am getting an error. Any thoughts on why this is failing?

$ ✗ cat test.py
import textdescriptives as td

text = "The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it."
# will automatically download the relevant model (´en_core_web_lg´) and extract all metrics
df = td.extract_metrics(text=text, lang="en", metrics=None)

# specify spaCy model and which metrics to extract
df = td.extract_metrics(text=text, spacy_model="en_core_web_sm", metrics=["readability", "coherence"])

print(df)

$ python3 test.py
ℹ No spacy model provided. Inferring spacy model for en.
ℹ 'textdescriptives/descriptive_stats' component is required for
'textdescriptives.readability'. Adding to pipe.
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    df = td.extract_metrics(text=text, lang="en", metrics=None)
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/textdescriptives/extractors.py", line 163, in extract_metrics
    nlp.add_pipe(f"textdescriptives/{component}")
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/spacy/language.py", line 791, in add_pipe
    validate=validate,
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/spacy/language.py", line 679, in create_pipe
    resolved = registry.resolve(cfg, validate=validate)
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/confection/__init__.py", line 729, in resolve
    config, schema=schema, overrides=overrides, validate=validate, resolve=True
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/confection/__init__.py", line 778, in _make
    config, schema, validate=validate, overrides=overrides, resolve=resolve
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/confection/__init__.py", line 849, in _fill
    getter_result = getter(*args, **kwargs)
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/textdescriptives/load_components.py", line 40, in create_textdescriptives_component
    nlp.add_pipe(component, last=True)
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/spacy/language.py", line 775, in add_pipe
    raise ValueError(Errors.E007.format(name=name, opts=self.component_names))
ValueError: [E007] 'textdescriptives/descriptive_stats' already exists in pipeline. Existing names: ['tok2vec', 'tagger', 'parser', 'senter', 'attribute_ruler', 'lemmatizer', 'ner', 'textdescriptives/descriptive_stats', 'textdescriptives/readability', 'textdescriptives/coherence', 'textdescriptives/dependency_distance']
HLasse commented 1 year ago

Hi Richard, thanks for pointing this out! There was a bug when setting the metrics argument of extract_metrics to None which is fixed in #186 and is now merged.

RichardLitt commented 1 year ago

Still having this issue.

➜  textdesc-test python3 test.py
ℹ No spacy model provided. Inferring spacy model for en.
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    df = td.extract_metrics(text=text, lang="en", metrics=None)
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/textdescriptives/extractors.py", line 163, in extract_metrics
    nlp.add_pipe(f"textdescriptives/{component}")
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/spacy/language.py", line 791, in add_pipe
    validate=validate,
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/spacy/language.py", line 679, in create_pipe
    resolved = registry.resolve(cfg, validate=validate)
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/confection/__init__.py", line 729, in resolve
    config, schema=schema, overrides=overrides, validate=validate, resolve=True
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/confection/__init__.py", line 778, in _make
    config, schema, validate=validate, overrides=overrides, resolve=resolve
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/confection/__init__.py", line 849, in _fill
    getter_result = getter(*args, **kwargs)
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/textdescriptives/load_components.py", line 40, in create_textdescriptives_component
    nlp.add_pipe(component, last=True)
  File "/Users/richard/Library/Python/3.7/lib/python/site-packages/spacy/language.py", line 775, in add_pipe
    raise ValueError(Errors.E007.format(name=name, opts=self.component_names))
ValueError: [E007] 'textdescriptives/dependency_distance' already exists in pipeline. Existing names: ['tok2vec', 'tagger', 'parser', 'senter', 'attribute_ruler', 'lemmatizer', 'ner', 'textdescriptives/dependency_distance', 'textdescriptives/coherence']
HLasse commented 1 year ago

That's odd. I can't reproduce on a fresh virtual environment with textdescriptives version 2.4.3 (from pip). Which version are you using?

RichardLitt commented 1 year ago

Updated. Works now. Thanks!