-
Hi, thanks a lot for putting the docker container together. It's great. I wonder if it's possible to have a MESA docker image available on docker hub. It'd make it more convenient to use MESA on the h…
-
Dear AutodE community,
I have been testing AutodE on our university cluster for quite a bit, and I have only managed to run it as an interactive job. My understanding that this should not be the cas…
-
I can help with this if you need help.
-Jackson
-
will you update the docs for the new 18.04 LTS?
also the ubuntu 18.04 slurm version is not the latest and a simple backport from sid builds fine.
cuda-10 and nvidia-410 with self built tensorflo…
-
I am trying to run the `run_full_learning.sh` script to train the full TULIP-TCR model. I am running into a segmentation following these logging message
```
2023-11-14 15:58:54.621704: I tensorflo…
-
### Bug description
Hello! When I train with DDP strategy, any type of crashes like `Out Of Memory (OOM)` error or `scancel` slurm job results in slurm nodes to drain due to `Kill task failed` which …
-
I'm interested in running funflow pipelines shipping external jobs to a cluster scheduler e.g. torque/slurm.
I was hoping to get some ideas on how to do this. I'm happy to write code and contribute…
-
The code here
https://github.com/Sense-GVT/DeCLIP/blob/main/prototype/model/image_encoder/modified_resnet.py#L103
calls a non-defined method (`new_group`)
-
I am trying to set up a new slurm cluster. I noticed that my nodes are only running one job per node. The nextflow script is identical when running on the old and the new cluster. I have the directi…
-
### What happened + What you expected to happen
**1. Bug**
When running Ray on a Slurm Cluster it seems like Ray RLlib does not respect which nodes are specified as the head and worker nodes with th…