semi-supervised learning choices

yuanqing-wang commented 4 years ago

@karalets

Sorry there was some delay. I'm also working on a few other projects.

I'm been looking into ways of semi-supervised learning. The paragraph vector approach in this paper (https://pubs.rsc.org/en/content/articlepdf/2019/sc/c9sc00616h), which is from https://arxiv.org/abs/1711.10168, gives me numerical stability issues since there is the term log(sigmoid( involved. I switch the dot product measure in that paper to cosine similarity. But found this initialization made the ride even bumpier (https://github.com/choderalab/pinot/tree/master/pinot/app/2020-04-01-171836719500 compared to random initialization https://github.com/choderalab/pinot/tree/master/pinot/app/2020-04-01-120856865376.)

I will continue to explore more semi-supervised algorithms, but at the same time, I think in terms of structure, I didn't find it hard to work with the existing scripts.

I wrote my semi-supervised loss function here (https://github.com/choderalab/pinot/blob/master/pinot/metrics/semi_supervised.py), and produced the weights, which was then used to initialize the supervised learning model by feeding into the --representation_parameter argument.

What other ways would you recommend to further make things convenient?

maxentile commented 4 years ago

I don't have any high-level suggestions.

Reviewing the code you linked https://github.com/choderalab/pinot/blob/6785a4edc1ee2cfcd3ebd8588c8213d517ae7bea/pinot/metrics/semi_supervised.py#L37-L56, I do have a couple low-level comments:

keyword argument k unused (is this important?)
unsure why cosine similarity is used rather just the dot product (what does the normalization do?)
unsure why a random permutation is used (could you clarify connection between this implementation and expectation appearing in eq 2 of arxiv link? is that the equation we should be looking at?)

yuanqing-wang commented 4 years ago

@maxentile

k is in Eq2 of the paper. It was supposed to be a hyperparameter, I dropped it (set it to 1 when trying to get it running). I'll put it back.

The normalization is because, since Eq2 is approximated using the negative sampling trick, and a log(sigmoid( term is introduced, this will lead to numerical stability issues.

karalets commented 4 years ago

I will look into this over the next couple of days, I was hoping you would start getting results off the shelf for the initial pass so we can work on the infrastructure around the models first.

choderalab / pinot

semi-supervised learning choices #5