Some new tasks - Githubissues

cora1021 / TreeLoss

0 stars 0 forks source link

Some new tasks #5

Open mikeizbicki opened 2 years ago

mikeizbicki commented 2 years ago

Task 1: building a tree on a projected dataset

Do a new synthetic experiment (like Experiment I in the paper), but with a new hyperparameter on the x-axis. The data generation procedure does not change at all from Section 6.1.1. Instead the procedure for generating the cover tree will change.

Currently, we have a matrix W^* : k\times d that contains the true parameter vectors. We will create a new matrix W' : k \times d' where d' is the new hyperparameter. In order to construct this matrix, first create a "random projection matrix" R : d \times d', then W' = W^* R. You can create a random projection matrix by sampling each entry of R from a normal distribution (to get a random matrix), then doing the SVD (to get a random matrix).

The W' matrix should not be involved in the data generation process in any way. This procedure, for example, should still work on the real world datasets. Instead, the W' matrix is used for generating the distances between the classes when constructing the cover tree. The distances between rows in the W' matrix is approximately, but not exactly, the distance between row in the W^* matrix due to the projection. Due to the much lower dimension of the W' matrix, however, the theoretical guarantees are stronger.

The x-axis of the plot should be k' varied from 1 to k, and the y-axis accuracy. We'll need a large k and d in order to observe any effects. Maybe k=d=1000 should work, but it might need to be even larger.

Task 2: word embeddings

The fasttext model has wordvectors that have been aligned so that words from 44 languages are all in the same vector space. See: https://fasttext.cc/docs/en/aligned-vectors.html . I would like to get the emoji2vec embeddings aligned into this vector space as well. For example, the word "happy" and the happy face emoji 😀 have similar embedding vectors, and the word "sad" and the crying face emoji 😢 should have similar embeddings.

cora1021 commented 2 years ago

Questions about Task 1: 1) Generating W': k' \times d as a distance matrix for cover tree, each row of W' represents each class, so k' should be equal to k. How can we define k' is a hyperparameter varied from 1 to k? I guess R: d \times k', W' = W^* R? 2) Why should do SVD to get a random matrix? Can we just generate a matrix from random normal distribution?

mikeizbicki commented 2 years ago

Whoops, you're absolutely right. I had some typos in the description that confused d' and k'. I've gone through and edited the original comment to fix these, so hopefully that answers the question.
Generating from a random normal distribution gives a random matrix, but not a random orthogonal projection matrix. (An orthogonal projection matrix P satisfies P = PP.) Doing the SVD let's you get convert the random normal matrix into one with this property. It turns out that in the limit as the dimension of the matrix goes to infinity, a random normal matrix basically becomes an orthogonal project with probability 1, so I would expect to get similar results without doing the SVD step. But it would be most accurate with the SVD step, especially when d or d' is small.

cora1021 commented 2 years ago

About second question, I understand what you mean about using SVD to get a orthogonal projection matrix. My concern is, if so, the orthogonal projection matrix is a square matrix R : d \times d, not R : d \times d'.

mikeizbicki commented 2 years ago

You take the first d' rows in order to get the d \times d' matrix. That will give us d' directions that are sampled uniformly at random from the unit hypersphere.

cora1021 commented 2 years ago

For emoji embedding: In order to align the emoji vectors into same space with fasttext, we need a dictionary between emoji and fasttext. Start with English, I take the description of emojis in Unicode and take the sum the embedding from fasttext of every word in these description. If so, our source are emoji embeddings from emoji2vec and our target are embeddings of emoji description. I delete these emojis which cannot find their description words in fasttext and the number of our final emojis are 1119. After getting aligned emoji embeddings, I found the most similar word associated with each emoji. I created a new folder named 'emoji_embedding' and uploaded two txt files. The file named 'results' contains the emoji, the similar word and the description in Unicode. I also checked the most similar chinese words, it looks good to me. About file 'results_sum', I just use the target (embeddings of emoji description) as emoji embeddings and try to find the most similar word. The two files look similar but a little difference. We can discuss about the method I used and the difference between the two files on tomorrow's meeting.

cora1021 commented 2 years ago

About tree loss on oYOLO9000: I just want to check if I understand correctly. Recently, I have read the YOLO9000 paper and the code. YOLO9000 is a model with a joint training strategy for classification and objection detection. YOLO9000 was trained on ImageNet for classification task and on COCO for objection detection task. YOLO9000 use a WordTree to build a hierarchical relation between classes of ImageNet and COCO. My understanding is we can replace WordTree with Cover Tree and use tree loss instead of cross entropy loss in classification task to train YOLO9000. I don't think I can change anything about the loss funtion of objection detection task.

If I am wrong, please let me know. Happy new year!

mikeizbicki commented 2 years ago

Yes, as a start we can just replace the cross entropy with the tree loss. I think there might be more we can do too, but we'll look at that after we have results for the simplest possible thing.

cora1021 commented 2 years ago

Some updates over the winter break:

TreeLoss on ImageNet dataset:

Fine-tuning torchvision models: I try to fine-tune the last layer of pre-trained torchvision models on ImageNet. I've done the experiments of ResNet and DenseNet models and the results seem not well. The performance of TreeLoss is almost equal to or slightly higher than cross entropy loss and SimLoss performs worse. I make some changes(like tune learning rate or more epochs or fine-tune whole model) and it doesn't work. I can show you the tensorboard and table results on tomorrow's meeting.
YOLO9000: I read the paper and found some code, but the code are used to apply yolo9000 or yolov2. I don't find the training code of yolo9000. Based on the paper, I need the code to follow up their training strategy. I think I want to move on to some new tasks since the reasons above and the experimens on ImageNet cost too much time.

Two new thoughts: I read some NLP papers recently and have two new thoughts.

Contrastive learning for long text embedding: Inspired by these two papers: https://arxiv.org/abs/2104.08821 https://openreview.net/forum?id=lm8Fb0c5k7_ My idea is to find positive pairs for unsupervised contrastive learning by masking random 40%-60% words of long texts. I didn't find any paper close to this.
Extension of Tree Loss: Inspired by this paper: https://aclanthology.org/2021.emnlp-main.359.pdf The idea of this paper is similar to Tree Loss, it incorprates relations between classes into supervised contrastive loss(SCL). I think we can apply tree structure into SCL.

So far my thoughts are still very trivial and not novel enough.