blei-lab / treeffuser

Treeffuser is an easy-to-use package for probabilistic prediction on tabular data with tree-based diffusion models.
https://blei-lab.github.io/treeffuser/
MIT License
25 stars 3 forks source link

Benchmark with LightGBMLSS / XGBoostLSS #114

Open KiwiAthlete opened 4 weeks ago

KiwiAthlete commented 4 weeks ago

Thanks for making this repo available, very interesting.

I have used LightGBMLSS for many projects and it is working quite well ,especially since you can also use Normalizing Flows and Mixture of Densities to estimate the distribution, besides parametric distributions.

I was wondering why there is no comparison of it in the paper or on your repo. Any plans to add it?

Thanks a lot!

velezbeltran commented 3 weeks ago

Hello @KiwiAthlete,

I was aware of LightGBMLSS before it added normalizing flows or mixture densities, that's why I we didn't think of adding it. That is cool! We tried with mixture of normals in preliminary tests but it was not working well so we didn't add it to the paper (we didn't use this library but a modified version of NGBoost that we wrote).

It would be nice to add it as a comparison to the repo and see how it stacks up. However, it is a bit late to add it to the final version of the paper as today is the camera-ready deadline. We are a bit busy at the moment so not sure we will get to this super soon. If we do, I'll post a comment here so that you can take a look. If you want to open a PR and add it to the testbed I would be happy to take a look as well.

Thank you for bringing this up!

KiwiAthlete commented 1 week ago

@velezbeltran Thanks for your comments. Please find my replies below.

I was aware of LightGBMLSS before it added normalizing flows or mixture densities, that's why I we didn't think of adding it. That is cool! We tried with mixture of normals in preliminary tests but it was not working well so we didn't add it to the paper (we didn't use this library but a modified version of NGBoost that we wrote).

NGBoost and LightGBMLSS are quite different, especially since NGBoost can be a little unstable and slow. Not sure if NGBoost has normalizing flows or mixture densities implemented. Have you used LightGBMLSS's implementation for this?

It would be nice to add it as a comparison to the repo and see how it stacks up. However, it is a bit late to add it to the final version of the paper as today is the camera-ready deadline. We are a bit busy at the moment so not sure we will get to this super soon. If we do, I'll post a comment here so that you can take a look. If you want to open a PR and add it to the testbed I would be happy to take a look as well.

I´d be happy to run some experiments, but won't commit to timelines. To make results reproducible, can you please share the training scripts or point me to them on your repo.

Thanks!

velezbeltran commented 1 week ago

Hello!

Sorry about the late response.

We haven't used the LightBMLSS implementation also curious to see how it works against it. NGBoost doesn't have normalizing flows or mixture densities. We implemented the latter ourselves by extending NGBoost.

That would be great and we would be excited to see how it works! If you want to add the benchmarks all of this is under ./testbed. Essentially, you need to extend the class in testbed/src/testbed/models/base_model.py and then add it to testbed/src/testbed/run_simulated_datasets.py. This should allow you to use the training scripts that we used and compare it against any other model.

Let us know if we can help :)

Nicolas