OFA-Sys / ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Apache License 2.0
964 stars 63 forks source link

Kernel died #46

Closed apking2000 closed 9 months ago

apking2000 commented 10 months ago

import torch from one_peace.models import from_pretrained

device = "cuda" if torch.cuda.is_available() else "cpu" model = from_pretrained("ONE-PEACE", device=device, dtype="float32")

when i try to load model my kernel died can you tell me the specification of device like memory requirements, gpu memory allocation etc?

logicwong commented 10 months ago

We only tested on a 40GB A100. You can try setting dtype='float16'.

apking2000 commented 9 months ago

Hi can you tell me how you use the multi modality to retrieve the data ex:- text+image to image?? It will be very helpful for me??

Thanks Aakash

On Mon, 8 Jan 2024, 12:53 Wang Peng, @.***> wrote:

We only tested on a 40GB A100. You can try setting dtype='float16'.

— Reply to this email directly, view it on GitHub https://github.com/OFA-Sys/ONE-PEACE/issues/46#issuecomment-1880492972, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUXQ2ET3GN2CUTGH64MCNBDYNONI5AVCNFSM6AAAAABBJVZLI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBQGQ4TEOJXGI . You are receiving this because you authored the thread.Message ID: @.***>

apking2000 commented 9 months ago

Like i want to know how we combine multiple embedding to retrieve the image data

On Mon, 15 Jan 2024, 17:28 aakash panda, @.***> wrote:

Hi can you tell me how you use the multi modality to retrieve the data ex:- text+image to image?? It will be very helpful for me??

Thanks Aakash

On Mon, 8 Jan 2024, 12:53 Wang Peng, @.***> wrote:

We only tested on a 40GB A100. You can try setting dtype='float16'.

— Reply to this email directly, view it on GitHub https://github.com/OFA-Sys/ONE-PEACE/issues/46#issuecomment-1880492972, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUXQ2ET3GN2CUTGH64MCNBDYNONI5AVCNFSM6AAAAABBJVZLI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBQGQ4TEOJXGI . You are receiving this because you authored the thread.Message ID: @.***>

logicwong commented 9 months ago

@apking2000 For guidance on implementing text+image to image retrieval, you can refer to multi-modal-embedding. To accomplish this, simply calculate the mean of text_features and image_features, and then employ the combined features to retrieve images.