ealcobaca / pymfe

Python Meta-Feature Extractor package.
https://pymfe.readthedocs.io
MIT License
125 stars 28 forks source link

[BUG] Unable to run MFE for datasets of more than ~500 features #130

Open schmitcn opened 1 year ago

schmitcn commented 1 year ago

Describe the bug When running MFE with group general and a dataset with more than (around) 500 features, a RecursionError: maximum recursion depth exceeded while calling a Python object error is thrown.

To Reproduce Steps to reproduce the behavior:

        mfe = MFE(groups=["general"])
        mfe.fit(X, y) # where X has more than 500 features

Expected behavior Generate the general meta-features.

Screenshots N/A

Desktop (please complete the following information):

Additional context The stack trace is as follows:

  File "[...]/lib/python3.8/site-packages/patsy/desc.py", line 400, in eval
    result = self._evaluators[key](self, tree)
  File "[...]/lib/python3.8/site-packages/patsy/desc.py", line 233, in _eval_binary_plus
    left_expr = evaluator.eval(tree.args[0])
  File "[...]/lib/python3.8/site-packages/patsy/desc.py", line 400, in eval
    result = self._evaluators[key](self, tree)
  File "[...]/lib/python3.8/site-packages/patsy/desc.py", line 233, in _eval_binary_plus
    left_expr = evaluator.eval(tree.args[0])
  File "[...]/lib/python3.8/site-packages/patsy/desc.py", line 394, in eval
    assert isinstance(tree, ParseNode)
RecursionError: maximum recursion depth exceeded while calling a Python object

The failure comes from patsy and seems to be related to what is mentioned in this issue in their repo. It is not fixed and they do not intend to do so, as the successor of patsy, formulaic already has this solved. My suggestion here would be to upgrade to formulaic, as patsy is no longer under active development (stated in their readme).

schmitcn commented 1 year ago

Hi @ealcobaca, @FelSiq,

Are there any plans on addressing this anytime soon? If not, that's fine, I just need to know this for a project planning purpose (so that we can look for a different tool).

Best regards.

FelSiq commented 1 year ago

Hi @schmitcn

Sorry for the delay. We won't be addressing this issue soon, but there might be a solution.

Did you try using mfe.fit(X, ..., transform_cat="one-hot")? This should avoid using the patsy dependency, and will provide an alternative method for encoding categorical variables.

Thanks for your feeback.

Best regards, Felipe.