automl / TabPFN

Official implementation of the TabPFN paper (https://arxiv.org/abs/2207.01848) and the tabpfn package.
http://priorlabs.ai
Apache License 2.0
1.22k stars 109 forks source link

setup.py dependencies #13

Closed Innixma closed 2 years ago

Innixma commented 2 years ago

setup.py seems to install more dependencies than should be necessary for this model to function. Would it make sense to instead have a tabpfn[benchmark] extra dependencies option akin to tabpfn[baselines]?

      install_requires=[
        'gpytorch>=1.5.0',
        'torch>=1.9.0',
        'scikit-learn>=0.24.2',
        'pyyaml>=5.4.1',
        'seaborn>=0.11.2',
        'xgboost>=1.4.0',
        'tqdm>=4.62.1',
        'numpy>=1.21.2',
        'openml>=0.12.2',
        'catboost>=0.26.1',
        'auto-sklearn>=0.14.5',
        'hyperopt>=0.2.5',
        'configspace>=0.4.21',
      ],

Currently it is unclear how to clone and run TabPFN isolated from source install without these dependencies.

Perhaps it would be ideal to instead have an entirely separate repository for benchmarking TabPFN so that CatBoost etc. have nothing to do with this repo. This would help a lot in terms of code cleanliness and separation of concerns.

Innixma commented 2 years ago

Note that the current pip install TabPFN installs the following:

Successfully installed configspace-0.6.0 gpytorch-1.8.1 hyperopt-0.2.7 liac-arff-2.5.0 minio-7.1.12 openml-0.12.2 py4j-0.10.9.7 tabpfn-0.1.5 xmltodict-0.13.0

I don't think configspace, openml, or hyperopt should be necessary (at a minimum). configspace is not supported on Windows, and if I were to seriously try integrating this into a system like AutoGluon, TabPFN would need to have only the minimum dependencies required to function to avoid bloat.

SamuelGabriel commented 2 years ago

Hi there :)

setup.py was very bloated indeed. We switched to a simplified setup.py, using the pyproject.toml. So now we only install the same things as when installing through pip install tabpfn, when using pip install -e .. Thanks for pointing this out. See https://github.com/automl/TabPFN/blob/main/setup.py

Reducing the dependencies further would make sense, too, we believe. Right now the dependencies are chosen such that you can re-train your own TabPFN, but probably most users need less.

Innixma commented 2 years ago

You could use extra_dependencies functionality of setup.py to have pip install TabPFN be the minimal inference dependencies and pip install TabPFN[train] include the full dependencies.

I am mostly interested in using the model purely for inference, not for training.

SamuelGabriel commented 2 years ago

Sounds good! Most people are, I guess. I will let you know when we have progress on separating concerns in our dependencies.

SamuelGabriel commented 2 years ago

I could remove most non-standard dependencies without a lot of re-structuring and updated the pip package accordingly. We are now down to the following dependencies:

dependencies=[
        'gpytorch>=1.5.0',
        'torch>=1.9.0',
        'scikit-learn>=0.24.2',
        'pyyaml>=5.4.1',
        'numpy>=1.21.2',
        'requests>=2.23.0',
]

Gpytorch could also be removed in a second step with a set of a little deeper changes, but not sure that is crucial. What do you say?

Innixma commented 2 years ago

That is awesome! If gpytorch could be removed, that would be very helpful, as gpytorch further depends on linear_operator>=0.1.1 that is a beta project with 17 GitHub stars that doesn't support python 3.7. This is not workable for AG, as we wish to continue supporting Python 3.7.

Once gpytorch is removed, the only added dependency would be TabPFN itself, which is ideal.

SamuelGabriel commented 2 years ago

We fixed this, too, now :) The gpytorch dependency is gone. I will upload this on pip in the coming days (it is on main though). Thanks to @David-Schnurr

Innixma commented 2 years ago

Awesome!!!!! Please let me know when the pip is available, I'd love to try it out

SamuelGabriel commented 2 years ago

We pushed this to pip. Feel free to give it a try :)

Innixma commented 2 years ago

Nice!! Will do