ThilinaRajapakse / pytorch-transformers-classification

Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Apache License 2.0
306 stars 97 forks source link

Extensions #15

Open pythonometrist opened 5 years ago

pythonometrist commented 5 years ago

Thanks to your help - I have added custom losses, special initialization and a bunch of other things as extensions.

I am now trying to mess with the sentence classification model itself. It is a linear layer on top of the bert model. What I would like to do is a) freeze all of bert. b) add a cnn over and above. https://github.com/Shawn1993/cnn-text-classification-pytorch/blob/master/model.py

I ant to compare results with a fozen and unfrozen bert. Any pointers would be most appreciated.

ThilinaRajapakse commented 5 years ago

Should be pretty similar to adding custom losses. You can freeze all the layers by setting requires_grad = False for all of them in your subclassed model. You can add your convolutional layers to it as well, and define how you want them to be used in the forward method. Hopefully, it won't mess with loading the weights from the pretrained model. I don't think it will.

pythonometrist commented 5 years ago

Cool - let me try it out . While config.hidden_size is the size of the last layer from bert (and in some sense the size of my embedding, I guess I am struggling to figure out the size of vocabulary. It's probably the Bert vocabulary size hiding somewhere in the config. max_seq_length is user specified so we already can assume padded sequences.Agreed the rest is carefully initializing the model and writing up the forward correctly... (which might be non trivial for me!) Let me get back to you. Thanks.

ThilinaRajapakse commented 5 years ago

If it doesn't work, you can always decouple BERT and the CNN and just feed the BERT outputs to the CNN.

I'm no expert myself, but you seem to be doing fine to me!

pythonometrist commented 5 years ago

Well - I got a model to work with some simple linear layers. So that is progress. I need to work out tensor sizes - bert is sending out tensors (64x768) - where 64 is batch size. I assume for each sentence I am receiving once embedding of size 768. I 've got to figure out how to go from there to a Vocabulary x Document matrix - I think it means that somewhere BERT is averaging over the words. OR I simply ned to forget about word embeddings and simply do a 1D convolution at the document level....will think some more and update.

pythonometrist commented 5 years ago

You da boss. Yep can do all sorts of models once you realize they offer up access to all layers to convolve /lstm over. I am curious if you know about the apex installation - one seems to be pure python vs the other uses c compiler - which one do you use?

ThilinaRajapakse commented 5 years ago

Great!

I use the Apex version with C++ extensions. The pure python version is lacking a few features. I don't see any reason not to use the C++ version.

pythonometrist commented 5 years ago

I am having some issue with apex on a debian server....well fingers crossed. Thanks for all the input! i had been wanting to get into pytorch for a while and now I am in!

ThilinaRajapakse commented 5 years ago

Odd. I never had issues with any Ubuntu based distros.

Welcome to Pytorch!

pythonometrist commented 5 years ago

Thanks - its a server which is stuck on pip 8.1 . But looks like i could get it to wirk with conda. fingers crossed.

pythonometrist commented 5 years ago

Ok it works with conda!!! - should apex batchnorm 32 be True? and O1 vs O2 - which way worked for you?

ThilinaRajapakse commented 5 years ago

I don't think I changed batchnorm. Doesn't it get set when you change the opt level? I used opt 1. Opt 2 was giving me NaN losses.

pythonometrist commented 5 years ago

Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic

Tht is the default when I run the models - not sure if that should be something else.keep_batchnorm_fp32 : None , I'll dig around and report.

ThilinaRajapakse commented 5 years ago

Yeah, I just kept the defaults there.