image-text pre-trianed - Githubissues

In this repository, I have made available two sets of pre-trained weights to facilitate further research and application development. The first set consists of pure vision-encoder weights, grounded in the ResNet50 architecture. This is a standard architecture of ResNet50 as found in the TorchVision Library, with the exception of the last fully connected (FC) layer. These weights can be freely accessed and downloaded from this link. Leveraging these weights allows for straightforward fine-tuning on various downstream tasks.

The second set of weights is designed for text-image joint representation, and is hosted on Google Drive. Typically, such weights find their application in zero-shot tasks. To support this, I have included scripts and code within the repository for zero-shot classification, enabling users to implement this advanced functionality with ease.

QtacierP / PRIOR

image-text pre-trianed #7