reproducing fine-tuned tokens. (textual inversion)

bansh123 commented 8 months ago

I have attempted to reproduce the results of the few-shot classification task on the PASCAL VOC dataset. I managed to achieve comparable outcomes when utilizing the fine-tuned tokens you previously shared via the Google Drive link. However, I was unsuccessful in reproducing the fine-tuned tokens. When employing fine_tune.py and aggregate_embeddings.py with the provided scripts, I obtained inferior tokens, resulting in significantly lower accuracy (approximately a 10% gap in 1-shot).

Am I overlooking something?

brandontrabucco commented 8 months ago

Hello bansh123,

Thanks for bringing this to my attention, I'll revisit this experiment to determine why you are seeing this behavior.

In the meantime, for the latest paper update on OpenReview, we used fine_tune_upstream.py, using the class name of the object as the initialization, and training 4 vectors per concept in the dataset with the --num_vectors argument.

We trained these tokens using a batch size of 8, stable diffusion 1.4, and 2000 gradient descent steps, a learning rate of 5.0e-04, scaled by the effective batch size via the --scale_lr argument to the script.

Which figure are you working to reproduce?

-Brandon

bansh123 commented 8 months ago

Thank you for your response. I am currently working on reproducing Figure 5, and I plan to tackle Figure 8 subsequently. I will conduct the experiments again, adhering to your configuration. Thank you!

JiaojiaoYe1994 commented 8 months ago

Hello bansh123,

Thanks for bringing this to my attention, I'll revisit this experiment to determine why you are seeing this behavior.

In the meantime, for the latest paper update on OpenReview, we used fine_tune_upstream.py, using the class name of the object as the initialization, and training 4 vectors per concept in the dataset with the --num_vectors argument.

We trained these tokens using a batch size of 8, stable diffusion 1.4, and 2000 gradient descent steps, a learning rate of 5.0e-04, scaled by the effective batch size via the --scale_lr argument to the script.

Which figure are you working to reproduce?

-Brandon

Hi, I am trying to reproduce figure 5, could you provide some hints (scripts) for visualization, as there are multiple methods to be evaluated together?

jsw6872 commented 8 months ago

I have attempted to reproduce the results of the few-shot classification task on the PASCAL VOC dataset. I managed to achieve comparable outcomes when utilizing the fine-tuned tokens you previously shared via the Google Drive link. However, I was unsuccessful in reproducing the fine-tuned tokens. When employing fine_tune.py and aggregate_embeddings.py with the provided scripts, I obtained inferior tokens, resulting in significantly lower accuracy (approximately a 10% gap in 1-shot).

Am I overlooking something?

I've tried using the pascal token provided by the author, but I can't reproduce the performance. If you don't mind me asking, could you share the seed or hyperparameters you used?

visualization_pascal_overall

Thank you.

zhixiongzh commented 7 months ago

Hi @brandontrabucco,

is it possible to share in google drive the new embeddings, which you get by initializing with the original class name and 4 num_vectors? Recomputing the embedding needs a lot of compute resources for more classes.

Thanks in advance for considering this request.

zhixiongzh commented 7 months ago

@brandontrabucco another question, how to ensure no ValueError when using the class name of the object as the initialization, since you have following check in the fine_tune_upstream.py

    if len(token_ids) > 1:
        raise ValueError("The initializer token must be a single token.")

some original class names which have multiple words would have more tokens than 1, e.g. pink primrose in Flowers102 dataset.

tanriverdiege commented 4 months ago

Thank you for your response. I am currently working on reproducing Figure 5, and I plan to tackle Figure 8 subsequently. I will conduct the experiments again, adhering to your configuration. Thank you!

@bansh123 Did you reproduce the results, I am having trouble getting the results presented in the paper (Figure 5 to be specific). Do you have any tips (scripts) that you can share to help me ? Thank you.

tanriverdiege commented 4 months ago

I have attempted to reproduce the results of the few-shot classification task on the PASCAL VOC dataset. I managed to achieve comparable outcomes when utilizing the fine-tuned tokens you previously shared via the Google Drive link. However, I was unsuccessful in reproducing the fine-tuned tokens. When employing fine_tune.py and aggregate_embeddings.py with the provided scripts, I obtained inferior tokens, resulting in significantly lower accuracy (approximately a 10% gap in 1-shot). Am I overlooking something?

I've tried using the pascal token provided by the author, but I can't reproduce the performance. If you don't mind me asking, could you share the seed or hyperparameters you used?

Thank you.

@jsw6872 Did you manage to reproduce the results ?

tanriverdiege commented 4 months ago

@brandontrabucco another question, how to ensure no ValueError when using the class name of the object as the initialization, since you have following check in the fine_tune_upstream.py
    if len(token_ids) > 1:
        raise ValueError("The initializer token must be a single token.")
some original class names which have multiple words would have more tokens than 1, e.g. pink primrose in Flowers102 dataset.

How did you resolve the issue ? I would really appreciate your help. Thanks !

brandontrabucco / da-fusion

reproducing fine-tuned tokens. (textual inversion) #24