AdeDZY / DeepCT

DeepCT and HDCT uses BERT to generate novel, context-aware bag-of-words term weights for documents and queries.
BSD 3-Clause "New" or "Revised" License
312 stars 46 forks source link

Question regarding quantization #15

Open thibault-formal opened 3 years ago

thibault-formal commented 3 years ago

Hi @AdeDZY , To get the new tfs, you used TF_{DeepCT}(t, d) = round(y_{t,d} * 100); I was wondering if you tried values other than 100 ?

I did similar experiments on related approaches (roughly a model learning term weights), and while experimenting with Anserini, I surprisingly noticed that increasing the quantization factor (100 in your case) degrades performance, and it was actually better to use a low value (like 5) ! I agree the models are not the same, but I initially have weights in a small range like you (~ [0,3]), so I am curious if you already tried other values (I don't think it's mentioned in the paper ?), or if you could actually observe some gains by tuning it !

Thanks, Thibault

AdeDZY commented 3 years ago

This is super interesting! I tried using [1, 10, 100, 1000], and found that 100 in general worked the best for DeepCT. When using small values (e.g., 1 and 10), a lot of words end up having weight=0 and were deleted in my setting.

I am wondering how does your weight distribution look like?