dattalab / keypoint-moseq

https://keypoint-moseq.readthedocs.io
Other
68 stars 28 forks source link

Kernel dies while fitting the full model #15

Closed Aexolowski closed 1 year ago

Aexolowski commented 1 year ago

I am running the code on a jupyter notebook, and the Kernel keeps dying at the step of full model fitting. My PC and GPU should be powerful enough for that (256 GB RAM and the Nvidia Geforce RTX 3080), so I am not sure what the issue is.

calebweinreb commented 1 year ago

Hello! We're aware of this problem and trying to figure out what's going on. So far it seems like a windows-specific issue. Are you using Windows?

Because the problem is OS specific, it's been hard to debug on our end. Would you be willing to update keypoint-moseq and jax-moseq as shown below? These versions are configured to print progress messages during model fitting so we can find out which step kills the kernel (make sure the keypoint MoSeq config has verbose set to True).

pip install -U git+https://github.com/dattalab/jax-moseq
pip install -U git+https://github.com/dattalab/keypoint-moseq.git@dev
Aexolowski commented 1 year ago

Hi, thanks for getting back to me! Yes, I am using Windows 11. I ran the updated version and this time, the full model was successfully generated! But then the Kernel died at the model application step.

calebweinreb commented 1 year ago

Great! The apply_model step isn't strictly required (unless you want to apply an existing model to new data). It mostly serves to prevent discontinuities caused by breaking up each session into chunks for parallelization. But these discontinuities are very rare (one per 10,000 frames by default). So for now, I would recommend running with num_iters=0, which will stitch the chunks back together while leaving the discontinuities. The full command would be:

results = kpms.apply_model(coordinates=coordinates, confidences=confidences, 
                           project_dir=project_dir, **config(), **checkpoint, num_iters=0)

If you're comfortable, it would be helpful if you shared the data (checkpoint file, config and keypoint data) so I can try to debug this on one of our windows machines. You could just send to my email (calebsw@gmail.com)

Aexolowski commented 1 year ago

I just ran the code using 0 iterations at the apply-model step and now it works! (I am just using the DLC example data from the website, will get back to you in case it doesn't work with my own data). Thanks a lot for your help!

calebweinreb commented 1 year ago

There's now a section in the docs for dealing with a dead kernel https://keypoint-moseq.readthedocs.io/en/latest/troubleshooting.html