Closed arvieFrydenlund closed 4 years ago
Hi, we are cleaning the code for LM. It will be online soon.
You can also change: for hv, vi in zip(hvs, v): param_size = hv.size() if len(param_size) <= 1: # for Bias and LN tmp_output = torch.abs( hv vi) + 0. hutchinson_trace.append( tmp_output ) elif len(param_size) == 2: # Matrix tmp_output1 = torch.abs((hv vi + 0.)).view(-1, self.block_length) # faltten to the N times self.block_length tmp_output2 = torch.abs(torch.sum(tmp_output1, dim=[1])).view(-1) / float(self.block_length) tmp_output3 = tmp_output2.repeat_interleave(self.block_length).view(param_size) hutchinson_trace.append(tmp_output3) in https://github.com/amirgholami/adahessian/blob/master/transformer/fairseq/optim/adahessian.py to support 3D kernels by yourself (for example the 1D conv for character level LM).
Hi, we add an instruction to support different type of kernel size and close this issue since it is covered there.
Apologies if I have this wrong, but is there code for the language modelling experiments? I think that /transformer only contains the NMT experiments. Thanks.