Closed jesspeers closed 7 months ago
Hi Jess, I'm not exactly sure what's going on there. We've moved on to a new codebase here: https://github.com/calico/baskerville, where we're continuing to actively develop and follow better software engineering practices. I'd recommend jumping over and trying your application there. Reach out if you get stuck, and we'll try to help.
Thank you! I'll give that a go
Hi,
I'm hoping to use Basenji on a HPC using slurm so have been attempting to work through the tutorials to ensure my install works correctly and to learn about how to run the scripts. (The tutorials are very well explained - thank you for making it so accessible!)
I have successfully run the first tutorial (preprocess) but am having issues with the train_test tutorial.
I submitted the following to a GPU node on our cluster:
python bin/basenji_train.py -o tutorials/models/heart tutorials/models/params_small.json data/heart_l131k_redownload
I tried running it on the data generated by the preprocessing data tutorial and I also tried downloading the data from the start of the train_test tutorial and had the same issue both times.
I got the following error:
This is the output of the job before it failed:
I've spoken to our computing team and they don't think it's an issue with the install. I was just wondering if you had any insight into what might be causing this error? I am not familiar with Tensorflow so I'm not sure if this is an issue with the way I'm trying to run Basenji.
I'd really appreciate any help or guidance! Happy to provide any further info if required.
Many thanks, Jess