Open Tzx11 opened 7 months ago
In this repository, I have made available two sets of pre-trained weights to facilitate further research and application development. The first set consists of pure vision-encoder weights, grounded in the ResNet50 architecture. This is a standard architecture of ResNet50 as found in the TorchVision Library, with the exception of the last fully connected (FC) layer. These weights can be freely accessed and downloaded from this link. Leveraging these weights allows for straightforward fine-tuning on various downstream tasks.
The second set of weights is designed for text-image joint representation, and is hosted on Google Drive. Typically, such weights find their application in zero-shot tasks. To support this, I have included scripts and code within the repository for zero-shot classification, enabling users to implement this advanced functionality with ease.
nice work!I have two question. When I finish reading this paper,i think the prior consists of a image encoder and a text encoder,so the image-text pre-trianed weights just contain image encoder and a text encoder weights? and How do I load image-text pre-trianed weights into models for other medical downstream tasks.