facebookresearch / StarSpace

Learning embeddings for classification, retrieval and ranking.
MIT License
3.95k stars 531 forks source link

Evaluation metric hit@k not a good fit for multi-label classification #240

Open mayankg53 opened 5 years ago

mayankg53 commented 5 years ago

Hi,

I am using the weight feature in both input and label space as mentioned in Readme: word_1:wt_1 word_2:wt_2 ... word_k:wt_k label1:lwt_1 ... labelr:lwt_r

Questions: My each training line looks like as: FOOD:0.52 INDUSTRY:0.52 STOCK:0.44 PRICES:0.44 COMPANY:0.32 EARNINGS:0.32 RANKINGS:0.30 MEATS:0.30 label42:0.8 label21:0.75 label30:0.8 label31:1.0 label39:1.0 label14:1.0

  1. Is this the correct format of the train file?

  2. In case yes, then if my first word is 2-gram word like here you can see FOOD INDUSTRY, then do I need to split and then define the weight as I have done above or is there any better way to retain multi-word information and assigning the weight together.?

  3. In case I need to add word and weight both in label, I am trying to combine these 2 as follows: word_1 word_2 ... word_k label_1_word_1 label_1_word_2 ... label_r_word_1 .. & word_1:wt_1 word_2:wt_2 ... word_k:wt_k label1:lwt_1 ... labelr:lwt_r Is that format supported ie. label_1_word_1:wt_1 else what can be the format in that case?

  4. I was able to train this file with trainMode = 0 and fileFormat as 'fastText' , when I try to test the file I am getting in RHS a single label and ++ and -- evaluation is done on that label only. Ideally if there are multiple labels then all labels with max. weight should be considered in RHS and then evaluation process needs to be performed. The metric hit@k conveys not a correct number for a multi-label classification problem. Is there something I am doing wrong or is it a limitation of the evaluation metric. Can I incorporate changes in to the existing metric and how?

LAV42 commented 5 years ago
  1. Your training format seems fine although I have a hard time understanding what you are trying to do

2&3. I haven't experimented with weighted words so I'll leave it to someone else

  1. You will have to make your own evaluation tool for multi class. A good strategy is instead of using the evaluation script, just obtain the scalar product (or cosine similarity) of your sentences against every label. Then you can evaluate which threshold on each category gives you a fair precision / recall.
mayankg53 commented 5 years ago
  1. Your training format seems fine although I have a hard time understanding what you are trying to do

2&3. I haven't experimented with weighted words so I'll leave it to someone else

  1. You will have to make your own evaluation tool for multi class. A good strategy is instead of using the evaluation script, just obtain the scalar product (or cosine similarity) of your sentences against every label. Then you can evaluate which threshold on each category gives you a fair precision / recall.

Please let me know in case you didn't get the problem statement.

LAV42 commented 5 years ago

The cosine similarity is how your embedding is learnt. So if you think that (LHS, label) cosine similarity is not a good metric then you're probably using the wrong tool for the job.

mayankg53 commented 5 years ago

The cosine similarity is how your embedding is learnt. So if you think that (LHS, label) cosine similarity is not a good metric then you're probably using the wrong tool for the job. On Thu., Apr. 25, 2019, 10:57 mayankg53, @.***> wrote: 1. Your training format seems fine although I have a hard time understanding what you are trying to do 2&3. I haven't experimented with weighted words so I'll leave it to someone else 1. You will have to make your own evaluation tool for multi class. A good strategy is instead of using the evaluation script, just obtain the scalar product (or cosine similarity) of your sentences against every label. Then you can evaluate which threshold on each category gives you a fair precision / recall. - For your understanding, I am trying to map large set of words (approx. size is 10k) to small set of words (approx. size is 100). So, you can understand my example as LHS terms are mapped to RHS labels and I had add weight for each term and label in LHS and RHS. - So, your idea of getting cosine similarity for the sentence don't works for me. Do you think anything else can be applicable. Please let me know in case you didn't get the problem statement. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#240 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AHQWFN3UTCXEI5NNPVV47ULPSHWKHANCNFSM4HIHVMFQ .

I agree with you cosine similarity is the metric measure for here. But my question is I have N terms on LHS and M Labels of RHS , so I need to calculate N*M, similarity measures, then how will I get to know that in combination of all LHS terms which group of RHS terms match the most as I am doing Multi-label classification not multi-class.

ledw commented 5 years ago

@mayankg53 Hi, to answer your questions:

  1. Yes the format is correct.
  2. No there is no support for multi word information. You need to specify the weights for each word.
  3. Yes the format is correct.
  4. You need to implement your own evaluation function as @LAV42 suggested.