anndvision / causal-bald

Apache License 2.0
12 stars 7 forks source link

Strong Interests in Your Code #21

Open uqhwen2 opened 11 months ago

uqhwen2 commented 11 months ago

Hi mate, your paper is so interesting and solid.

I wanna test the code, but I must be so dumb so such I don't know how to get your code started properly (even with detailed instructions) :(.

I was like looking for something, a line of bash syntex to run your code.

Your instruction like: causal-bald \ active-learning \ --job-dir experiments/ \ --num-trials 5 \ --step-size 10 \ --warm-start-size 100 \ --max-acquisitions 38 \ --acquisition-function random \ --temperature 0.25 \ --gpu-per-trial 0.2 \ ihdp \ --root assets/ \ deep-kernel-gp

I just no idea about how to convert this info into a linux terminal (copy and paste returns nothing haha). I try to run the main.py, no luck :(.

Thanks in advance for any response!

Cheers

anndvision commented 11 months ago

hey, thanks for trying out the code!

do you have a screenshot of what happens when you try to run the code?

should work ok to copy and paste, but can always try running on a single line, like

causal-bald active-learning --job-dir experiments/ --num-trials 5 --step-size 10 --warm-start-size 100 --max-acquisitions 38 --acquisition-function random --temperature 0.25 --gpu-per-trial 0.2 ihdp --root assets/ deep-kernel-gp
uqhwen2 commented 11 months ago

Hi Andrew,

I carefully follow your installation instructions and successfully install the causal-bald env via conda.

Here is the issue "causal-bald: command not found", I copy and paste your suggestion "causal-bald active-learning --job-dir experiments/ --num-trials 5 --step-size 10 --warm-start-size 100 --max-acquisitions 38 --acquisition-function random --temperature 0.25 --gpu-per-trial 0.2 ihdp --root assets/ deep-kernel-gp" into the terminal.

Since I'm testing other baseline, so I hope to get your code running and make a cite!

anndvision commented 11 months ago

my apologies, I have

pip install .

as optional in the instructions, but it should be run. just the

-e

argument is optional

uqhwen2 commented 11 months ago

I'm so sorry, it is really my bad, I just ignore that part for no reason :).

I tried:

causal-bald active-learning --job-dir experiments/ --num-trials 5 --step-size 10 --warm-start-size 100 --max-acquisitions 38 --acquisition-function random --temperature 0.25 --gpu-per-trial 0.2 ihdp --root assets/ deep-kernel-gp

"causal-bald: command not found" SOLVED! But still get the error:

ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-2.32 GB) is less than -6% of total. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).

I try to crack it by myself, and chatgpt suggests me to set the

ray.init(memory=8 * 1024 * 1024 * 1024, object_store_memory=4 * 1024 * 1024 * 1024)

and I found in the main.py, we already have the default:

@click.option(
    "--object-memory-store",
    default=8000000000,
    type=int,
    help="ray parameter, default=8000000000",
)

my CPU has 32GB and GPU has 24GB, I try to scale up the default value, but still the same issue. I also try to set

ray.init(memory=8 * 1024 * 1024 * 1024, object_store_memory=4 * 1024 * 1024 * 1024)

from the start of the whole code, seems no help. Any idea?

anndvision commented 11 months ago

instead of adding a ray.init() at the start of the code, you could try editing line 230 of the main.py file.

not sure it would work

can also try commenting out line 230

tbh, choosing ray was a huge mistake

a friend had a similar issue before, let me try to dig up that conversation in the mean time

uqhwen2 commented 11 months ago

🖐🚨 Commenting out Line 230 works!!! The code now running!

Thanks for the help so far mate! Implementing other's code is always not a smooth path hahaha.


🎊🎉UPDATE:

Commenting out Line 230 is not the best option. The program crushed after a while, I notice that the error prompt like:

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
(pid=597992) /home/hl506-8850/anaconda3/envs/causal-bald/lib/python3.9/site-packages/torch/cuda/__init__.py:143: UserWarning: 
(pid=597992) NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
(pid=597992) The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
(pid=597992) If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch

So, I upgrade the PyTorch to fit my GPU (e.g., in my case, pip install --upgrade torch==2.1.1 torchvision==0.16.1 -f https://download.pytorch.org/whl/cu118/torch_stable.html, and conda install protobuf==3.20.3), and uncomment the line 230, miracle happens! The code now runs smoothly and the GPU is leveraged!

If anyone else meet the same error, just find the suitable PyTorch version with the GPU :).

Cheers!

anndvision commented 11 months ago

Thanks for looking into this! Happy that you found a solution. Not sure what the best step forward is since the dependencies are quite old now, but may be necessary for strict reproducibility.