Closed EternityDS closed 1 year ago
I have the same problem. How can I get the LabeledHybrid dataset? And can you tell us your email to contact with you conveniently? Thank you.
(1) You may need to download the latest code. The latest code supports post-pre-training based on the LabeledHybrid dataset. Regarding the UnlabeledHybrid and LabeledHybrid datasets, due to their significant data volume and the licensing considerations associated with each dataset, we are attempting to integrate the processing methods for these datasets into code that can be directly utilized. Please be patient as we work on this integration. (2) We have conducted the ablation experiment and updated it on the repository's homepage. It's in the seventh section.
As mentioned above, we are attempting to integrate the processing methods for these datasets into code that can be directly utilized. My email address is 3120215466@bit.edu.cn, and you can reach out to me directly.
Thank you for your answer. If it is possible, please upload the preprocessing code as quickly as possible
Thanks for your quick response! I think I download the latest code and it seems it still does not support the LabeledHybrid dataset.
If I miss anything, please let me know! Thank you again for your help!
I apologize deeply for our oversight, and I greatly appreciate you bringing this matter to our attention. Due to the supervised training nature of the post-pre-training stage, we have seamlessly integrated it into the finetuning process in the code. Therefore, you just need to include the --finetune_model
flag in the command. We have also made updates to the README to reflect this change.
Thanks for the great work. I have two questions about the post-pretraining.
I notice the current codebase does not suppport the LabeledHybrid dataset. Could you please give more description on this dataset and how do you do you this dataset for the post-pretraining (e.g., the loss design)?
Without the post-pretraining, how much performance will drop for the PointGPT-B and PointGPT-L?