Open shashidhar22 opened 10 months ago
Hello,
Not sure if that is the problem but you should adapt the slurm command of the sh script to your own machine/cluster. maybe try to just launch the .py in an interactive shell to see if this is the problem.
python full_learning.py --train_dir ../mydatapath/dataNew/Full_train_pretune_mhcX_2.csv \ --test_dir ../mydatapath/dataNew/VDJ_test_2.csv \ --modelconfig configs/shallow.config.json \ --save multiTCR_s_flex
I am trying to run the
run_full_learning.sh
script to train the full TULIP-TCR model. I am running into a segmentation following these logging messageIs this just a memory issue? Or is there some other explanation?
For context, I am trying to run the code on linux node with 36 cores and 650GB of RAM. Below are the nvidia-smi output
Please let me know if you need any additional information