ali-vilab / AnyDoor

Official implementations for paper: Anydoor: zero-shot object-level image customization
https://ali-vilab.github.io/AnyDoor-Page/
MIT License
3.9k stars 353 forks source link

Question about MVImageNet Dataset (3.4 TB) #98

Open trungpx opened 2 months ago

trungpx commented 2 months ago

Dear authors,

Could you please tell me how many images have been used for MVImageNet?

It is said on their web that: "MVImgNet contains 6.5 million frames from 219,188 videos, the total size is about 3.4 TB." So I just wondering have you used this huge full data (3.4 TB) to train AnyDoor to achieve the reported performance?

XavierCHEN34 commented 2 months ago

No, we only use the subset with segmentation masks

trungpx commented 2 months ago

Thanks so much for your reply. Could you help to elaborate more a confusion below?

In the paper MVImageNet, Table 1 lists 104,261 segmentations. image Figure 1. MVImageNet paper

In AnyDoor paper, Table 1 lists as follows: image Figure 2. AnyDoor paper

It means that AnyDoor used full 104,261 segmentations which corresponding to 219,188 videos. Is it correct? Could you share an estimated number of videos have been used so that I can download the proper ones? Since I looked up their datasets, it contains a lot of huge files, really heavy if download all of them.

image Figure 3. Dataset download page