Shared-Reality-Lab / IMAGE-server

IMAGE project server components
Other
2 stars 7 forks source link

Model checkpoint warnings in preprocessors including content-categorizer #853

Closed jeffbl closed 1 week ago

jeffbl commented 2 months ago

When looking at logs on unicorn when sending photos, it appears that recent updates have caused some sort of model update to be necessary. For now, it appears to be doing the update each time it is loaded, but also gives instructions for making the change permanent, e.g. :

content-categoriser-1             | 2024-07-16T22:15:15.617475546Z Lightning automatically upgraded your loaded checkpoint from v1.4.4 to v1.9.0. To apply the upgrade to your files permanent
ly, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file latest-0.ckpt`                                                                                                       

Low priority as there doesn't seem to be any end-user impact, but don't want it to become an issue later if we do need to make any updates.

@AndyBaiMQC As you look at the preprocessor stack, if you think we'll be deprecating any of the impacted preprocessors, please don't spend time on the updates.

AndyBaiMQC commented 1 month ago

There appears to be a workaround: https://github.com/Lightning-AI/pytorch-lightning/issues/17220

jeffbl commented 1 month ago

I'm still seeing this warning on unicorn:

content-categoriser-1 | 2024-08-02T20:20:10.374135435Z Lightning automatically upgraded your loaded checkpoint from v1.4.4 to v1.9.0. To apply the upgrade to your files permanent ly, runpython -m pytorch_lightning.utilities.upgrade_checkpoint --file latest-0.ckpt``

Is this just a matter of updating to the current version of lightning? I tried reading through the link above, but it wasn't clear to me.

jeffbl commented 1 month ago

(And reminder, if we're going to deprecate any of these based on your upcoming updates, then no need to fix!!)

AndyBaiMQC commented 1 month ago

I see... I didn't fully follow the found example in the link but if we ever face this again I'll take a look

jeffbl commented 1 month ago

On unicorn, run /var/docker/image/imagelogs and watch as you make a photo query using the extension targeting unicorn. (If you need instructions for any of that, please ping me ASAP!)

AndyBaiMQC commented 1 month ago

Ping'd

AndyBaiMQC commented 1 week ago

python -m lightning.pytorch.utilities.upgrade_checkpoint model.ckpt

@jeffbl Seems like I omitted this. Can we try this and see if it suppresses warnings? We don't need to pass the --file flag any more.

AndyBaiMQC commented 1 week ago

Image

AndyBaiMQC commented 1 week ago

@jeffbl Issue resolved for this one. For future, if ever we use TL models, run python -m lightning.pytorch.utilities.upgrade_checkpoint model.ckpt (replace model.ckpt with actual checkpoint file name)

jeffbl commented 5 days ago

I see the model is updated on unicorn, but the image is not rebuilt so the error is still in the logs, and of course it is still and issue on pegasus. Anyway, if we're moving to deprecate the old preprocessor, there is no sense fixing this, and if we have the problem with other models, we'll fix it there. Leaving this closed as "won't fix", since I don't think it is happening elsewhere, and we should know what to do if it crops up again.

AndyBaiMQC commented 4 days ago

Got it. Sorry didn't realize it needs rebuilding, but still, worth creating a 'shortcut' script that does model updates (If ever we use Torch Lightning based models) and manually run it from time to time