Open pythonometrist opened 5 years ago
Should be pretty similar to adding custom losses. You can freeze all the layers by setting requires_grad = False
for all of them in your subclassed model. You can add your convolutional layers to it as well, and define how you want them to be used in the forward
method.
Hopefully, it won't mess with loading the weights from the pretrained model. I don't think it will.
Cool - let me try it out . While config.hidden_size is the size of the last layer from bert (and in some sense the size of my embedding, I guess I am struggling to figure out the size of vocabulary. It's probably the Bert vocabulary size hiding somewhere in the config. max_seq_length is user specified so we already can assume padded sequences.Agreed the rest is carefully initializing the model and writing up the forward correctly... (which might be non trivial for me!) Let me get back to you. Thanks.
If it doesn't work, you can always decouple BERT and the CNN and just feed the BERT outputs to the CNN.
I'm no expert myself, but you seem to be doing fine to me!
Well - I got a model to work with some simple linear layers. So that is progress. I need to work out tensor sizes - bert is sending out tensors (64x768) - where 64 is batch size. I assume for each sentence I am receiving once embedding of size 768. I 've got to figure out how to go from there to a Vocabulary x Document matrix - I think it means that somewhere BERT is averaging over the words. OR I simply ned to forget about word embeddings and simply do a 1D convolution at the document level....will think some more and update.
You da boss. Yep can do all sorts of models once you realize they offer up access to all layers to convolve /lstm over. I am curious if you know about the apex installation - one seems to be pure python vs the other uses c compiler - which one do you use?
Great!
I use the Apex version with C++ extensions. The pure python version is lacking a few features. I don't see any reason not to use the C++ version.
I am having some issue with apex on a debian server....well fingers crossed. Thanks for all the input! i had been wanting to get into pytorch for a while and now I am in!
Odd. I never had issues with any Ubuntu based distros.
Welcome to Pytorch!
Thanks - its a server which is stuck on pip 8.1 . But looks like i could get it to wirk with conda. fingers crossed.
Ok it works with conda!!! - should apex batchnorm 32 be True? and O1 vs O2 - which way worked for you?
I don't think I changed batchnorm. Doesn't it get set when you change the opt level? I used opt 1. Opt 2 was giving me NaN losses.
Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic
Tht is the default when I run the models - not sure if that should be something else.keep_batchnorm_fp32 : None , I'll dig around and report.
Yeah, I just kept the defaults there.
Thanks to your help - I have added custom losses, special initialization and a bunch of other things as extensions.
I am now trying to mess with the sentence classification model itself. It is a linear layer on top of the bert model. What I would like to do is a) freeze all of bert. b) add a cnn over and above. https://github.com/Shawn1993/cnn-text-classification-pytorch/blob/master/model.py
I ant to compare results with a fozen and unfrozen bert. Any pointers would be most appreciated.