training dataloader issue

frank-xwang / InstanceDiffusion

[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"

https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/

Apache License 2.0

484 stars 25 forks source link

training dataloader issue #10

Closed 2a3b4c closed 7 months ago

2a3b4c commented 7 months ago

1709129880833 how to get the "projection_matrix" file?

frank-xwang commented 7 months ago

Hi, projection_matrix (768*768) is the CLIP projection matrix, which should be weight.data of Linear layer defined in CLIP (out_dim, in_dim). We actually didn't use the image embedding, therefore, you can comment out this line. Let me know if you have further questions.

frank-xwang commented 7 months ago

update: I have pushed the codes with this line cemented out, plz let me know if you meet other issues.

2a3b4c commented 7 months ago

I review the decode processed data, and find the corresponding caption of every segmantion is empty, is it normal?

frank-xwang commented 7 months ago

I don't think so, except in cases where all instances are quite small (smaller than 32x32). You have the option to modify this line to if area >= 0*0: to include instance captions for all instances. However, be aware that captions for very small instances might be less accurate. Alternatively, using the category name as the instance caption is a simpler option that might also work.

2a3b4c commented 7 months ago

thanks for your answer, i used the coco dataset to process, and the key "is_det" in data is 0, so caption information in the decoded data can not be obtained. I just wander waht the key "is_det" meaning? and the meaning of o365 in the corresponding comment "# if it is from detection (such as o365), then we will make a pseudo caption"

frank-xwang commented 7 months ago

If is_det is 0, the model will use the generated instance captions. Otherwise, it will use the category name (from ground-truth or an object detection model) as the pseudo instance caption.