Can‘t achieve the scores in paper

PaffxAroma commented 3 years ago

I'm trying to reproduce the paper, but cann't reach 0.85 ACC runing this code on GoogleNews-S. All hyper-parameters are set to same values in paper, and the data is enhanced with contextual argumentation. The running result shows, not the model but the representation with K-Means performances better with 0.75 acc , and the model just reaches 0.62 acc. When I increase the clustering head lr, the result with model still remains 0.62 level. What should I do to improve this？

yanhan19940405 commented 3 years ago

I also did not achieve good results. After visualizing the data embedding space, I found that the sentence embedding matrix generated by SCCL does not have obvious discrimination. What happened to your follow-up?

Dejiao2018 commented 3 years ago

Thanks for your interests in our work @PaffxAroma. To your questions,

1) The reported accuracy of google-s is 83.1 instead of 85. 2) As we claimed in the paper, ACC is reported as the KMeans clustering results. 3) Also you should check how the clustering accuracy changes along the learning process. Arbitrary long learning process will result in degenerated performance.

@yanhan19940405 , can you provide more context about your plot? Is it a TSNE visualization? If so, why only one color here. Please refer to my answer to your original question. thanks

yanhan19940405 commented 3 years ago

Hello, the result of this picture is not obtained by TSNE. Instead, according to the Bert-Flow paper, the original verification data of the viewpoint is all encoded using the SCCL model obtained by training to get the sentence embedding drawn. The sentence embedding matrix dimension is (m, 128), and the sample distribution map drawn by using PCA to reduce the 128 feature dimensions to 2 dimensions.

---Original--- From: @.> Date: Fri, Jul 30, 2021 22:52 PM To: @.>; Cc: @.**@.>; Subject: Re: [amazon-research/sccl] Can‘t achieve the scores in paper (#3)

Thanks for your interests in our work @PaffxAroma. To your questions,

The reported accuracy of google-s is 83.1 instead of 85. 2) As we claimed in the paper, ACC is reported as the KMeans clustering results. 3) Also you should check how the clustering accuracy changes along the learning process. Arbitrary long learning process will result in degenerated performance.

@yanhan19940405 , can you provide more context about your plot? Is it a TSNE visualization? If so, why only one color here. Please refer to my answer to your original question. thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

yanhan19940405 commented 3 years ago

In addition, the opinion data is derived from news data, and the binary data is manually corrected to determine whether it contains opinion information. After manual proofreading, the F1 index of the supervised classification model is around 0.91. But the data is invalid after being used on SCCL. In addition, the data enhancement method uses the Google translation engine back translation method.

---Original--- From: @.> Date: Fri, Jul 30, 2021 22:52 PM To: @.>; Cc: @.**@.>; Subject: Re: [amazon-research/sccl] Can‘t achieve the scores in paper (#3)

Thanks for your interests in our work @PaffxAroma. To your questions,

The reported accuracy of google-s is 83.1 instead of 85. 2) As we claimed in the paper, ACC is reported as the KMeans clustering results. 3) Also you should check how the clustering accuracy changes along the learning process. Arbitrary long learning process will result in degenerated performance.

@yanhan19940405 , can you provide more context about your plot? Is it a TSNE visualization? If so, why only one color here. Please refer to my answer to your original question. thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Dejiao2018 commented 3 years ago

Please refer to Table 3 in our paper. Back translation does not perform well in our experiment, which we do not recommend for contrastive learning based short text clustering.

Also for your problem, in addition to the data augmentation, I do encourage you check 1) and 2) possible causes in my response to your original question #4 , which should more likely cause the problems you encounter.

As for the sentence embedding matrix, should it be (m, 768) instead, m indicates the batch size? It seems BERT-flow focuses on pairwise semantic similarity only, I'm not sure whether the statement there can generalize to categorical data. I may encourage using the TSNE plot on the (distil)bert embeddings instead.

yanhan19940405 commented 3 years ago

thanks yep,We can Learn all this information from your paper. But I haven't found the data enhancement details in your code for the time being. Therefore, only the back translation method can be used instead. In addition, even if the effect is poor, it should not be ineffective. . Because from the results of the distribution map, the samples are highly dense and not distinguishable. In the process of reproducing the paper, the codes of clustering probability distribution in my code were all extracted and replaced with your original code. I will conduct a second experiment verification based on the content you mentioned. Thank you again for your reply. If the team allows me, I will release my code and data before contacting you. Have a nice weekend, thank you.

---Original--- From: @.> Date: Fri, Jul 30, 2021 23:21 PM To: @.>; Cc: @.**@.>; Subject: Re: [amazon-research/sccl] Can‘t achieve the scores in paper (#3)

Please refer to Table 3 in our paper. Back translation does not perform well in our experiment, which we do not recommend for contrastive learning based short text clustering.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

yanhan19940405 commented 3 years ago

Sorry, I just noticed your last reply. Yes, 128 means embedding size, which is obtained by linear transformation from 768 dimensions. m represents the overall sample number (not clearly stated here,feel sorry)

rajat-tech-002 commented 3 years ago

Thanks for your interests in our work @PaffxAroma. To your questions,

The reported accuracy of google-s is 83.1 instead of 85. 2) As we claimed in the paper, ACC is reported as the KMeans clustering results. 3) Also you should check how the clustering accuracy changes along the learning process. Arbitrary long learning process will result in degenerated performance.

@yanhan19940405 , can you provide more context about your plot? Is it a TSNE visualization? If so, why only one color here. Please refer to my answer to your original question. thanks

@Dejiao2018 . The idea in the paper in quite good. I like the approach. So, all the results reported in the paper are with Bert Embeddings and Kmeans? Rather than with Clustering Head? What was the reason for not reporting results with Cluster Head? Was the ACC with Clustering Head always less than with Kmeans? Thanks

1085737319 commented 2 years ago

I'm trying to reproduce the paper, but cann't reach 0.85 ACC runing this code on GoogleNews-S. All hyper-parameters are set to same values in paper, and the data is enhanced with contextual argumentation. The running result shows, not the model but the representation with K-Means performances better with 0.75 acc , and the model just reaches 0.62 acc. When I increase the clustering head lr, the result with model still remains 0.62 level. What should I do to improve this？

What is the parameter setting of the data set SearchSnippets?

amazon-science / sccl

Can‘t achieve the scores in paper #3