VAST-AI-Research / TripoSR

MIT License
4.47k stars 512 forks source link

Great!!! GREAT!!! Look here!!! 看过来!!! #15

Open yuedajiong opened 7 months ago

yuedajiong commented 7 months ago

great!!! Great!!! GREAT!!!

please don't close this, keep a few days. :-)

Engaged in 3D reconstruction and generation for a long time, although not an expert, but definitely experienced. I have read thousands of papers and run hundreds of codes. This is the first and currently the only one that I genuinely want to give a thumbs up to.

长期从事3D重构和生成,虽然不是高手,但绝对很有经验了。读了上千篇论文,跑了数百个代码。这是第一也是目前唯一一篇,我发自内心想点赞的。

这篇是唯一,在靠近我理想中那种:“终极视觉算法”的,其中比较主要的一段功能,而且有点质量了。 后面慢慢分享。

(代码已经非常干净了;超过99%的开源;但还是不够干净,我自己修改了一份更清爽的,包含训练。)

image

yuedajiong commented 7 months ago

my real case: 000003 wizard.zip

very good! this is sota, only one, I tried 100+ open-source!

mrbid commented 7 months ago

Yeah I like it too. Would like it more if it output PLY files too. ;) 3DTopia is now #2 place to this.

yuedajiong commented 7 months ago

Hi, bro:
export .ply, so easy. just like this: mesh.export(os.path.join(output_dir, "mesh.ply"))

mesh.zip

yuedajiong commented 7 months ago

@mrbid you said: "3DTopia is now #-2 place to this." which leaderboard?

mrbid commented 7 months ago

@mrbid you said: "3DTopia is now #-2 place to this." which leaderboard?

I love 3DTopia but it is of my opinion that TripoSR better currently. No offence meant by the comment.

I make this comment mostly comparing only the first stage of 3DTopia to TripoSR however, I would be open to being proven wrong with examples of where the first stage of 3DTopia out performs TripoSR, be it speed or quality. :bow:

hughkhu commented 7 months ago

great!!! Great!!! GREAT!!!

please don't close this, keep a few days. :-)

Engaged in 3D reconstruction and generation for a long time, although not an expert, but definitely experienced. I have read thousands of papers and run hundreds of codes. This is the first and currently the only one that I genuinely want to give a thumbs up to.

长期从事3D重构和生成,虽然不是高手,但绝对很有经验了。读了上千篇论文,跑了数百个代码。这是第一也是目前唯一一篇,我发自内心想点赞的。

这篇是唯一,在靠近我理想中那种:“终极视觉算法”的,其中比较主要的一段功能,而且有点质量了。 后面慢慢分享。

(代码已经非常干净了;超过99%的开源;但还是不够干净,我自己修改了一份更清爽的,包含训练。)

image

Could you share your training code? It will be appreciated.

yuedajiong commented 7 months ago

@hughkhu
share with you, later. Just trainable, no long-time training and get best weights, you can train and tune by yourself.

mail me, copy to you. (on-developing code)

hughkhu commented 7 months ago

@hughkhu share with you, later. Just trainable, no long-time training and get best weights, you can train and tune by yourself.

mail me, copy to you. (on-developing code)

I have already sent the email. Looking forward to a positive response.

aakash-chaddha commented 7 months ago

Hello @bennyguo , Thank you so much for your research. If possible, could you share the training-code with me? Here's my email: aakashchaddha@gmail.com Also, could you share the timeline about if and when you will release the training code. Thanks once again.

yuedajiong commented 7 months ago

@hughkhu sent; check mail; just PoC.

hddy2000 commented 7 months ago

@yuedajiong Could you share me the training code? Many thanks! I will email you now!

LaFeuilleMorte commented 7 months ago

老哥,可以分享一下train code吗,我的邮箱是13716855718@163.com。感谢

dlwns97 commented 7 months ago

@yuedajiong I'm thrilled to hear the good news that you created training code. Could you share the code with me? I'm really grateful for your work."

This is my email address. "dlwns23@sogang.ac.kr"

yuedajiong commented 7 months ago

各位兄弟,我没有大规模的从头训练这个模型。 所以,我的代码修改就两个: 1)推理部分:把作者的代码,整理的干净了点,我主要方便我自己学习,几十行代码可以看到它算法的核心,方便我自己的算法的研究。 2)训练部分:我主要用最简单的数据作为输入,直到输出构造损失,可以完整的训练。 要真的训练出作者类似的质量的权重,它有很多小tricks。 作者搞的效果更好:a) 网络部分和LRM有修改,比如删除camera-pose,减半triplane的维度等等,b)有一个是用图像的patch训练(大概可能是图片的的部分),然后这样可以增大 batch; c)还有一个是用mask-loss了。 按照我自己的一些经验,要很强很快的把一个对象给“fit”出来,开始的时候mask/轮廓这些要全重高一些,后面的时候去侧重texture部分。(先shape后texure),这些类似的点,作者训练过程的技巧,得自己去琢磨。

我把代码贴在这里,自己去下载了完善修改;如果有更好得版本,我也贴在这里。

1) install all necessary packages, refer to TripoSR webpage. 2) open superv.py, and goto the tail: download checkpoints and sav to ./ckpt// 3) copy you test .png, and uncomment infer and try, make sure this step is OK 4) adjust necessary network parameters(layers) for older GPU(GPU memory limite), then try to trainning.

I implemented a newer vision: with a clean ViT. so I can easily modify all network structure, then run training in any older GPU for researching.

NOTICE:

  1. the trainning code is just for PoC/proof of concept, write code in 1 day including infer-code modification.
  2. all TODOs in superv.py.

If I will focus on this code, I can share the later completed version to you.

triposr.zip

FisherYuuri commented 7 months ago

大佬,请问能向你要一份数据集(或者是数据集格式)参考一下吗?感谢您,我的邮箱1798815097@qq.com

yuedajiong commented 7 months ago

兄弟, 数据集要看你在2D上建Loss还是3D上。 2D上建Loss,整个互联网图片视频立体模型都是你的数据。 3D上建Loss,你要去下载各种数据集:有的数据集类别多,有的类别少但角度光照种类等细分多。

如果你要搞偏“万物通用的”,大的如objavese等。

有些数据集,下载要很多天,特别是中国;有些数据集,看起来小,但文件若干T,比如ABO。只能你自己去下载。

如果你要搞动态模型,其实很少。 我构造维护了一些特殊的“人类+运动”,“互作”,“人类部件”,的数据集。

这是一些大立体模型的,截图给你:你自己去下载就是。

image

要构造出作者模型可以训练的:

  1. mask (如果没有,你可以去除背景的其他模型;如果是在3D上渲染得到数据同时保留mask/rgba)
  2. pose (如果是在3D上训练,构造数据的时候记录camerap-pose,各种表示可以转换,整个最后需要用来计算ray的o,d)
  3. patch (作者说他取输入的小patch,然后增大batch数,具体如何操作取patch的我还没有细折腾,不知道是不是crop之类) 没有啥特殊的。
doanthinhvo commented 7 months ago

Hi @bennyguo, thanks for your amazing work. Could you share me the training code. My email is blvrxdnthnhv@gmail.com Thank you

FisherYuuri commented 7 months ago

my real case: 000003 wizard.zip

very good! this is sota, only one, I tried 100+ open-source!

您好,这是你的输入图像吗?我下载查看了你的模型,贴图部分并没有这么精细,还是说贴图部分有能够提高的方法?

CBQ-1223 commented 7 months ago

Hi @bennyguo, thanks for your amazing work. Could you share me the training code. My email is blvrxdnthnhv@gmail.com Thank you

great!!! Great!!! GREAT!!!

please don't close this, keep a few days. :-)

Engaged in 3D reconstruction and generation for a long time, although not an expert, but definitely experienced. I have read thousands of papers and run hundreds of codes. This is the first and currently the only one that I genuinely want to give a thumbs up to.

长期从事3D重构和生成,虽然不是高手,但绝对很有经验了。读了上千篇论文,跑了数百个代码。这是第一也是目前唯一一篇,我发自内心想点赞的。

这篇是唯一,在靠近我理想中那种:“终极视觉算法”的,其中比较主要的一段功能,而且有点质量了。 后面慢慢分享。

(代码已经非常干净了;超过99%的开源;但还是不够干净,我自己修改了一份更清爽的,包含训练。)

image

da

great!!! Great!!! GREAT!!!

please don't close this, keep a few days. :-)

Engaged in 3D reconstruction and generation for a long time, although not an expert, but definitely experienced. I have read thousands of papers and run hundreds of codes. This is the first and currently the only one that I genuinely want to give a thumbs up to.

长期从事3D重构和生成,虽然不是高手,但绝对很有经验了。读了上千篇论文,跑了数百个代码。这是第一也是目前唯一一篇,我发自内心想点赞的。

这篇是唯一,在靠近我理想中那种:“终极视觉算法”的,其中比较主要的一段功能,而且有点质量了。 后面慢慢分享。

(代码已经非常干净了;超过99%的开源;但还是不够干净,我自己修改了一份更清爽的,包含训练。)

image

大佬好,QQ加你了 1325966315 请教一下这种模型可以微调嘛,大概需要什么算力和数据呢,我是做自动驾驶的想通过这个生成3D资产比如汽车 倒地的行人啥的

chenxinli001 commented 7 months ago

Hi! Could you please share your training code? It will be great appreciated! my email is chenxinli@link.cuhk.edu.hk

yuedajiong commented 7 months ago

@XGGNet JUUUUUUST trainnable, even on older GPU. Formal training that can get good results, requires modification and experimentation. triposr.zip

chenxinli001 commented 7 months ago

@XGGNet JUUUUUUST trainnable, even on older GPU. Formal training that can get good results, requires modification and experimentation. triposr.zip

Thanks bro. You are the real hero.

Rameshkumardas commented 7 months ago

    from network import TSR
    print('=========Model Class initialized========>>>>>>') 
    # print("======torch.__version__=======>>>>>>>", torch.__version__)
    model = TSR(radius=0.87, valid_thresh=0.01, num_samples_per_ray=128, img_size=512, depth=12, embed_dim=768, num_layers=16, num_channels=1024, cross_attention_dim=768)
    # # model = TSR(radius=0.87, valid_thresh=0.01, num_samples_per_ray=128)

    # Assuming you have a model checkpoint file named 'model.ckpt'
    checkpoint_path = './ckpt/TripoSR/model.ckpt'

    # Load the model checkpoint using torch.load
    checkpoint = torch.load(checkpoint_path, map_location='cpu')
    print('=========loaded Checkpoint========>>>>>>')     
    model.load_state_dict(checkpoint, strict=False)
    model.to(device)```
Rameshkumardas commented 7 months ago

Hi @yuedajiong Please check

image

yuedajiong commented 7 months ago

please check code: 1) you can do infer with official checkpoint. 2) I modified the network structure to smaller, for training on small GPU, so can not reuse official checkpoint. just for PoC.

FisherYuuri commented 7 months ago
    from network import TSR
    print('=========Model Class initialized========>>>>>>') 
    # print("======torch.__version__=======>>>>>>>", torch.__version__)
    model = TSR(radius=0.87, valid_thresh=0.01, num_samples_per_ray=128, img_size=512, depth=12, embed_dim=768, num_layers=16, num_channels=1024, cross_attention_dim=768)
    # # model = TSR(radius=0.87, valid_thresh=0.01, num_samples_per_ray=128)

    # Assuming you have a model checkpoint file named 'model.ckpt'
    checkpoint_path = './ckpt/TripoSR/model.ckpt'

    # Load the model checkpoint using torch.load
    checkpoint = torch.load(checkpoint_path, map_location='cpu')
    print('=========loaded Checkpoint========>>>>>>')     
    model.load_state_dict(checkpoint, strict=False)
    model.to(device)```

Hi,为什么我从作者下载的superv.py和你的不一样?

yuedajiong commented 7 months ago

@FisherYuuri 我在修改,我发上来的版本还是支持对原始的checkpoint的加载。就是类和变量名字没有变化。 我的代码相对简洁一点。

yuedajiong commented 7 months ago

这篇文章,还有LRM等,确实比较好了。

现在最大的问题,是在3D数据集上训练的。 哪怕只是训练最后渲染那一步需要相机角度,但都还是3D,这个是个大限制。

需要突破。

FisherYuuri commented 7 months ago

@FisherYuuri 我在修改,我发上来的版本还是支持对原始的checkpoint的加载。就是类和变量名字没有变化。 我的代码相对简洁一点。

期待大佬的新修改!You are the real hero!

CBQ-1223 commented 7 months ago

@FisherYuuri 我在修改,我发上来的版本还是支持对原始的checkpoint的加载。就是类和变量名字没有变化。 我的代码相对简洁一点。 大佬,看完了你写的infer和train的代码,能问一下大佬写这个train的代码预计是有什么排期嘛,或者有什么需要帮助的可以让我来帮助做一些开发嘛

joshkiller commented 6 months ago

@XGGNet JUUUUUUST trainnable, even on older GPU. Formal training that can get good results, requires modification and experimentation. triposr.zip

Thanks bro. You are the real hero.

@XGGNet JUUUUUUST trainnable, even on older GPU. Formal training that can get good results, requires modification and experimentation. triposr.zip

Thanks bro. You are the real hero.

Good morning bro . have you trained your model with the code? i'm doing my internship right now and i need to train a model like this so i can join it to an stable diffusion model to create text to 3d pipeline. so please if someone can give me some tips. it will definitely help me. And thamk to the team for this amazing work

yuedajiong commented 6 months ago

My status is still open for exploration, and too many technical issues are still open: what 3D/4D representation is better, how to incrementally increase the prior, how to truly achieve zero-camera, that is, the rendering phase of training does not require a camera pose. Gesture, how to represent different motions in 4D, handle interactions, etc. Therefore, I did not train on a large scale, because large-scale training requires a lot of computing resources and storage resources. When the algorithm is not determined on a small and medium scale, this training cost is very high. If you want to train, for example, if you need relatively accurate camera poses, you can only bring the poses from Objaverse to render and record the camera poses. These renderings are very computationally and storage-intensive; model training also requires a GPU cluster.

correct trainable code for different algorithms includding this, are ready and on optimizing, "I can share newer version". if you have enough CPU and storage, you can render and train ...

DiamondGlassDrill commented 6 months ago

@yuedajiong would be interested too in the newer version.

yuedajiong commented 6 months ago

too many files, so uploaded to: https://github.com/yuedajiong/super-ai-vision-stereo-world-generate-triposr/. have fun.

DiamondGlassDrill commented 6 months ago

@yuedajiong awesome work!!! And thank you very much.

learn01one commented 6 months ago

https://github.com/yuedaj​​iong /super-ai-vision-stereo-world-generate-triposr /

Hello, thank you very much for your great contribution. Is this link still available?

yuedajiong commented 6 months ago

yes. but just a simple training demo.