Closed amueller closed 1 year ago
Downgrading to seaborn 0.11 yields:
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
I had changed device to 'cuda', changing it to 'cpu' makes it work.
another question: is there a way to do multi-gpu training using your scripts from the notebook you provide? I don't see any code to spawn workers, it looks like init_dist
requires using torchrun
?
Thanks for the update, Andreas. The version in the requirements, should work then, I guess. :)
Yes, there is. We used submitit
to run all our experiments, since we have a SLURM cluster.
Our parallelization is heavily inspired by this repo: https://github.com/facebookresearch/dino
If you have a SLURM cluster: You can make the train
call with executor.submit
you can simply update the parameters of ex to schedule a multi gpu job:
executor.update_parameters(
gpus_per_node=8,
tasks_per_node=8, # one task per GPU
)
If not: launching with torchrun
should also work out of the box, as I wrote some code to handle it, but I am not 100% sure, as we did not use this in a while.
The important code is here: https://github.com/automl/TabPFN/blob/f02c093c101f80cb4f462f834c22456bbd3c1e84/tabpfn/utils.py#L238
The code does not support multi-node trainings, though.
Thanks for the update, Andreas. The version in the requirements, should work then, I guess. :)
Oh, I thought maybe the requirements file was consume by the setup.py as the installation instructions only mention the pip install. It would be great to have end-to-end instructions for reproducing the training.
Thanks for the pointer to sumitit
, I'll check out how it works. I don't have a slurm cluster, I have a cloud ;) I'm currently using torchrun
.
Did you get this far, installing from pip? I did not expect this to work tbh and thought one needs to install from requirements to train. I will add the requirement to the setup, thanks! :)
Oh yeah I didn't touch the requirements.txt, it wasn't mentioned anywhere.
I think adding requirements.txt to setup is a bad habit, but many people do it. Having maybe one section for installing for using the model and one for reproducing the training would be great.
Yeah, I won’t add the full requirements. No worries :) I will just add the seaboard <=0.12
On 30. Jan 2023, at 18:46, Andreas Mueller @.***> wrote:
Oh yeah I didn't touch the requirements.txt, it wasn't mentioned anywhere.
I think adding requirements.txt to setup is a bad habit, but many people do it. Having maybe one section for installing for using the model and one for reproducing the training would be great.
— Reply to this email directly, view it on GitHub https://github.com/automl/TabPFN/issues/25#issuecomment-1409053195, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK7PSNMWYAPOWZX5IDDYHDWU746HANCNFSM6AAAAAAUJC27OA. You are receiving this because you modified the open/close state.
No <=0.11
On 30. Jan 2023, at 18:47, Samuel M @.***> wrote:
Yeah, I won’t add the full requirements. No worries :) I will just add the seaboard <=0.12
On 30. Jan 2023, at 18:46, Andreas Mueller @. @.>> wrote:
Oh yeah I didn't touch the requirements.txt, it wasn't mentioned anywhere.
I think adding requirements.txt to setup is a bad habit, but many people do it. Having maybe one section for installing for using the model and one for reproducing the training would be great.
— Reply to this email directly, view it on GitHub https://github.com/automl/TabPFN/issues/25#issuecomment-1409053195, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK7PSNMWYAPOWZX5IDDYHDWU746HANCNFSM6AAAAAAUJC27OA. You are receiving this because you modified the open/close state.
I've tried running the PriorFittingCustomPrior.ipynb and run into some difficulties. It seems lightgbm is getting imported, but it's not part of the pyproject.toml. Also, seaborn '0.12.2' raises an error when plotting:
ValueError: The following variable cannot be assigned with wide-form data: `hue`
It would be awesome to get a conda environment with a working config, I also didn't see the python3.7 requirement at first, since it's only mentioned in the requirements.txt.