changes needed to run on cluster

NSAPH-Projects / topological-equivariant-networks

E(n)-Equivariant Topological Neural Networks

MIT License

19 stars 0 forks source link

changes needed to run on cluster #24

Closed ekarais closed 4 months ago

ekarais commented 6 months ago

we need the following changes to run training on clusters:

in environment.yaml: add nvidia to channels and add pytorch-cuda=11.8 to dependencies.
in main_qmp9.py: adapt the wandb.init call so that the run gets logged to the team account ten-harvard (how? @gdasoulas)

gdasoulas commented 6 months ago

Replace the environment.yaml with the next one:

name: ten
channels:
  - pytorch
  - nvidia
  - anaconda
  - conda-forge
  - defaults
  - pyg
dependencies:
  - coverage=7.4.3
  - gudhi=3.8.0
  - matplotlib=3.7.2
  - networkx=3.1
  - numpy=1.22.4
  - pandas=1.4.2
  - pip=23.2.1
  - conda-forge::pre-commit=3.6.0
  - pyg::pyg
  - pygments
  - pexpect
  - pytest=7.4.4
  - python=3.10
  - pytorch=2.1.0
  - pytorch-cuda=11.8
  - pyg::pytorch-scatter
  - rdkit
  - scipy=1.11.3
  - seaborn
  - tqdm=4.66.1
  - wandb=0.15.12
  - pip:
    - git+https://github.com/pyt-team/TopoNetX@cede811485aefcff1d013dbb94942e8f92ac5d05

gdasoulas commented 6 months ago

Regarding the wandb-init replace the original line with: wandb.init(entity='ten-harvard', project=f"QM9-{args.target_name}")

ekarais commented 4 months ago

There haven't been any issues training on the clusters, so this is a non-issue. Closing.