biolab / orange3-educational

šŸŠ šŸŽ“ Educational widgets for machine learning and data mining in Orange 3.
Other
27 stars 20 forks source link

Text from Create Table cannot be used for text mining #169

Open wvdvegte opened 1 month ago

wvdvegte commented 1 month ago
Educational version

0.8.0

Orange version

3.37.0

Expected behavior

Doing text mining with text entered in Create Table should be possible by Editing the Domain of the table output to force the text to be interpreted as text (rather than categorical data), then connect Corpus to Edit Domain, and select the text variable as "Used text features"

Actual behavior

Connecting Corpus to Edit Domain results in an error:

Error encountered in widget Corpus:

Traceback (most recent call last):
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/widgets/owcorpus.py", line 336, in update_feature_selection
    corpus = self.corpus.copy()
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 481, in copy
    c = super().copy()
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/table.py", line 1491, in copy
    t = self.__class__(self)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 71, in __new__
    return super().__new__(cls, *args, **kwargs)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/table.py", line 718, in __new__
    return cls.from_table(args[0].domain, args[0], **kwargs)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 558, in from_table
    Corpus.retain_preprocessing(source, c, row_indices)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 649, in retain_preprocessing
    new.text_features = list(filter(None, [
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 650, in 
    new._find_identical_feature(tf)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 129, in _find_identical_feature
    var == feature
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/variable.py", line 418, in __eq__
    and var1._compute_value == var2._compute_value
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/preprocess/transformation.py", line 240, in __eq__
    and np.allclose(self.lookup_table, other.lookup_table,
  File "", line 180, in allclose
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core/numeric.py", line 2265, in allclose
    res = all(isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan))
  File "", line 180, in isclose
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core/numeric.py", line 2372, in isclose
    xfin = isfinite(x)
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

If the error is ignored and the text variable is selected as "Used text features", Corpus will ignore it and put the default corpus book-excerpts.tab on its output.

Steps to reproduce the behavior

Open Create table with text.ows.zip and connect Corpus to Edit Domain to reproduce the behavior described above.

janezd commented 1 month ago

@ajdapretnar, could you take a look?