Open Ultraman6 opened 2 weeks ago
Hello,
You are right, my implementation adapt the encoder only. I tought that it would be wise to adapt the feature extractor which is the encoder. I believe that adapting the mask decoder would make similar results so I don't understand why the results are like this.
For the prompt encoder, I am not sure if it is necessary to adapt.
I see the fine-tune of lora implementions in your code is only tune the parameters of image-encoder in sam, if it is important to take adaptation of downstream prompt encoder and mask decoder in sam? why I try to expand the fine-tune to those block, but the result shows like that Is the period of training should be longer than that of fine-tune of only lora?