amirgholami / adahessian

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
MIT License
265 stars 49 forks source link

Language Modelling code #4

Closed arvieFrydenlund closed 4 years ago

arvieFrydenlund commented 4 years ago

Apologies if I have this wrong, but is there code for the language modelling experiments? I think that /transformer only contains the NMT experiments. Thanks.

yaozhewei commented 4 years ago

Hi, we are cleaning the code for LM. It will be online soon.

You can also change: for hv, vi in zip(hvs, v): param_size = hv.size() if len(param_size) <= 1: # for Bias and LN tmp_output = torch.abs( hv vi) + 0. hutchinson_trace.append( tmp_output ) elif len(param_size) == 2: # Matrix tmp_output1 = torch.abs((hv vi + 0.)).view(-1, self.block_length) # faltten to the N times self.block_length tmp_output2 = torch.abs(torch.sum(tmp_output1, dim=[1])).view(-1) / float(self.block_length) tmp_output3 = tmp_output2.repeat_interleave(self.block_length).view(param_size) hutchinson_trace.append(tmp_output3) in https://github.com/amirgholami/adahessian/blob/master/transformer/fairseq/optim/adahessian.py to support 3D kernels by yourself (for example the 1D conv for character level LM).

yaozhewei commented 4 years ago

Hi, we add an instruction to support different type of kernel size and close this issue since it is covered there.