castorini / castor

PyTorch deep learning models for text processing
http://castor.ai/
Apache License 2.0
178 stars 58 forks source link

Replication of STOA for Reuters Dataset #152

Closed achyudh closed 5 years ago

achyudh commented 5 years ago

Work in progress: Please don't merge until all of the tasks above are done.

achyudh commented 5 years ago

Corresponding pull-request in Castor-Data: https://git.uwaterloo.ca/jimmylin/Castor-data/merge_requests/11

Impavidity commented 5 years ago

@achyudhk Regarding the data format, if you keep label text in dataset, you can use some function for conversion. I use this scripts format for myself. You could do similar thing:

def one_hot_representation(shape, dim, idx, value):
  one_hot = torch.LongTensor(*shape).zero_().to(idx.device)
  one_hot.scatter_(dim, idx, value)
  return one_hot
achyudh commented 5 years ago

@Impavidity Thanks, I'll change the existing datasets I pushed to Castor-data to this format.

Ashutosh-Adhikari commented 5 years ago

Additional STOAs :

Ashutosh-Adhikari commented 5 years ago

Steps for LSTM_Regularzation :

achyudh commented 5 years ago

@Impavidity I made the changes you requested. Please take a look at the diff.