crockwell / Cap3D

[NeurIPS 2023] Scalable 3D Captioning with Pretrained Models
https://huggingface.co/datasets/tiange/Cap3D
214 stars 12 forks source link

Finetuning SOTA 3D Models #1

Open blueangel1313 opened 1 year ago

blueangel1313 commented 1 year ago

Hi! I saw in a tweet that you alluded to fine-tuning SOTA pre-trained models: https://twitter.com/_crockwell/status/1668649409918246913?s=20

Could you perhaps offer some insight into how you did this? Would really appreciate it!

crockwell commented 1 year ago

Hey blueangel,

Thanks for your interest! We will be releasing code (and pretrained models) to replicate this process in the coming weeks. For now, I'd recommend reading Appendix I for details on finetuning (please check the latest arXiv version https://arxiv.org/pdf/2306.07279.pdf).

Best, Chris

tiangeluo commented 1 year ago

Hi @blueangel1313 , we've provided finetune codes for Point-E (text-to-3D) and Shap-E. You could please refer the details in https://github.com/crockwell/Cap3D/tree/main/text-to-3D.

blueangel1313 commented 1 year ago

Awesome!!! Can't wait to give this a whirl :D

blueangel1313 commented 1 year ago

Hi there again! I greatly appreciate your team sharing this training codebase, it's been incredibly helpful. I was wondering if there might be a possibility of your team sharing the fine-tuned weights for both SHAP-E and POINT-E from your tests? I understand completely if this isn't feasible, but I thought it wouldn't hurt to ask. Thanks again for all your assistance!

tiangeluo commented 1 year ago

For sure, please check https://huggingface.co/datasets/tiange/Cap3D/tree/main/our_finetuned_models. We currently provide two finetuned models, which are finetuned with half of our data. More models are on the way.

blueangel1313 commented 1 year ago

Thank you for sharing this incredible resource! I have downloaded the SHAP-E weights and conducted some tests on my end. Interestingly, I observed a similar trend as discussed in the paper, where there weren't significant improvements in the model's performance.

Regarding the NaN error during training, have you done any additional troubleshooting or experimentation? I am considering conducting a training session with a reduced learning rate, as it may help solve the NaN issue on SHAP-E. I would greatly appreciate hearing about any further steps you have taken to address this. Thanks again for your time and the valuable resources you've shared!

tiangeluo commented 1 year ago

Thank you for your following-ups.

Would you mind sharing the results of your finetuned shapE with me later? I actually still confused why there is NaN issue. I've tried some smaller lr, but didn't help.

Finetuning the diffusion model with more data (we currently use 330k) and also finetuning the encoder of shapE (i.e., transmitter) should can improve performance. However, before try them, I feel the NaN issue may be a big problem which yield the suboptimal performance. Will keep u updated, if I find something.

blueangel1313 commented 1 year ago

Got it. I will see if we can make any progress on NaN issue. I will keep you updated. Thanks!

camenduru commented 1 year ago

Thanks for the project ❤️ We are also trying to fine-tune an open-source model. If you want, please join us on the text-to-3d channel at https://discord.gg/k5BwmmvJJU

blueangel1313 commented 1 year ago

I was curious if you had experimented with any other optimizers during your development process? I'm considering trying SGD to see if it might help with the NaN issue.

Apologies if this question seems basic, but any insights you could provide would be greatly appreciated! :)

tiangeluo commented 1 year ago

It is a great question!

Yeah, I've tried both Adam-W, SGD, and Sophia. However, all of them caused NaN issues.

Btw, do u have any progress/attempts on the finetune aspects?

crockwell commented 4 months ago

Closing for inactivity -- @blueangel1313 please feel free to reopen if you end up having + wanting to share progress on finetuning. No worries if not. Thanks!

tiangeluo commented 4 months ago

Hi @blueangel1313 ,

I wanted to update you on some recent advancements we've made with the shapE finetuning performance. By addressing the hallucinations in the Cap3D captions, we've successfully enhanced its performance. You can review the detailed findings in this study: arXiv link.

However, I'm currently facing challenges with NaN issues in the data. If you have any new insights, please feel free to share them with me!