Open ajay-vikram opened 3 months ago
Hi Ajay, you may be running with the data shaped differently. We expect that the out
tensor is shaped [4*hidden_state, batch_size]
, so I would expect that out
should be shaped [256, 1]
and not [256]
.
At benchmark.py:125 (batch_results[m] = self.workload_metrics[m](self.model, preds, data)
), can you please check the shape of preds
and data
? Otherwise, it may be an issue with the hook connected to the RNNCell which tracks inputs.
Also, there is the LSTM example for a different sequence task here which may be helpful.
Hi Jason, The shapes of pred and data are [256, 2] and ([256, 1, 96], [256, 2]) respectively, where data is a tuple. These are the inputs to my model as well. What shape do you expect as input to the LSTMCell. In my case, a [1, 96] tensor goes to the LSTMCell. This [1, 96] comes from the acc_spikes in the buffering mechanism of the forward pass, similar to the one in primate_example.
The shape is reasonable to me, can you check whether your code matches the code block from this previous issue #225? That works with the latest neurobench package 1.0.6, as well as any arbitrary batch size. If there is still issues, please post your code block so we can inspect the error.
Ohh, I see. I didn't get the latest version. How do I get it? Do I run .bumpversion.toml?
pip install --upgrade neurobench
or if you are using poetry and a local cloned repo, then simply git pull
on main branch
Still getting the same issue. Can you tell which code has been modified. Ill check if the changes have been updated.
Changes are listed in #227
Please check if you can successfully run the minimal example from the code block in #225
If there is still an issue, please provide a minimal example of the model definition and harness call which causes the issue.
Yes the minimal example code runs.
Here's my model definition
class LSTM(nn.Module):
def __init__(self, input_dim):
super(LSTM, self).__init__()
self.input_dim = input_dim
self.output_dim = 2
self.lstm = nn.LSTMCell(self.input_dim, 64)
self.fc1 = nn.Linear(64, 32)
self.fc2 = nn.Linear(32, 16)
self.fc3 = nn.Linear(16, self.output_dim)
self.layernorm0 = nn.LayerNorm(self.input_dim)
self.layernorm1 = nn.LayerNorm(32)
self.layernorm2 = nn.LayerNorm(16)
self.relu = nn.ReLU()
self.dropout = nn.Dropout(0.3)
self.bin_window_time = 0.2
self.sampling_rate = 0.004
self.bin_window_size = int(self.bin_window_time / self.sampling_rate)
self.register_buffer("data_buffer", torch.zeros(1, self.input_dim).type(torch.float32), persistent=False)
def single_forward(self,x):
x = x.unsqueeze(0)
x = self.layernorm0(x)
(hn, cn) = self.lstm(x)
out = self.relu(hn)
out = self.layernorm1(self.relu(self.fc1(out)))
out = self.dropout(out)
out = self.layernorm2(self.relu(self.fc2(out)))
out = self.fc3(out)
return out
def forward(self, x):
predictions = []
seq_length = x.shape[0]
for seq in range(seq_length):
current_seq = x[seq, :, :]
self.data_buffer = torch.cat((self.data_buffer, current_seq), dim=0)
if self.data_buffer.shape[0] <= self.bin_window_size:
predictions.append(torch.zeros(1, self.output_dim).to(x.device))
else:
# Only pass input into model when the buffer size == bin_window_size
if self.data_buffer.shape[0] > self.bin_window_size:
self.data_buffer = self.data_buffer[1:, :]
# Accumulate
spikes = self.data_buffer.clone()
acc_spikes = torch.sum(spikes, dim=0)
pred = self.single_forward(acc_spikes)
predictions.append(pred)
predictions = torch.stack(predictions).squeeze(dim=1)
return predictions
This is the benchmark code
import torch
from torch.utils.data import DataLoader, Subset
from neurobench.datasets import PrimateReaching
from neurobench.models.torch_model import TorchModel
from neurobench.benchmarks import Benchmark
from ANN import ANNModel2D
from GRU import GRU
from LSTM import LSTM
all_files = ["indy_20160622_01"]
# all_files = ["indy_20160622_01", "indy_20160630_01", "indy_20170131_02",
# "loco_20170210_03", "loco_20170215_02", "loco_20170301_05"]
footprint = []
connection_sparsity = []
activation_sparsity = []
dense = []
macs = []
acs = []
r2 = []
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
for filename in all_files:
print("Processing {}".format(filename))
# The dataloader and preprocessor has been combined together into a single class
data_dir = "/home/satyapreets/Ajay/neurobench/neurobench/data" # data in repo root dir
dataset = PrimateReaching(file_path=data_dir, filename=filename,
num_steps=1, train_ratio=0.5, bin_width=0.004,
biological_delay=0, remove_segments_inactive=False)
test_set_loader = DataLoader(Subset(dataset, dataset.ind_test), batch_size=256, shuffle=False)
net = LSTM(input_dim=dataset.input_feature_size)
# net = ANNModel2D(input_dim=dataset.input_feature_size, layer1=32, layer2=48,
# output_dim=2, bin_window=0.2, drop_rate=0.5)
net.load_state_dict(torch.load("/home/satyapreets/Ajay/neurobench/mobilenet_training/experiments/vww/submission/lstm_64_indy_20160622_01.pt", map_location=device)['state_dict'])
# net.load_state_dict(torch.load("./model_data/2D_ANN_Weight/"+filename+"_model_state_dict.pth", map_location=device))
model = TorchModel(net)
static_metrics = ["footprint", "connection_sparsity"]
workload_metrics = ["r2", "activation_sparsity", "synaptic_operations"]
# Benchmark expects the following:
benchmark = Benchmark(model, test_set_loader, [], [], [static_metrics, workload_metrics])
results = benchmark.run(device=device)
print(results)
footprint.append(results['footprint'])
connection_sparsity.append(results['connection_sparsity'])
activation_sparsity.append(results['activation_sparsity'])
dense.append(results['synaptic_operations']['Dense'])
macs.append(results['synaptic_operations']['Effective_MACs'])
acs.append(results['synaptic_operations']['Effective_ACs'])
r2.append(results['r2'])
print("Footprint: {}".format(footprint))
print("Connection sparsity: {}".format(connection_sparsity))
print("Activation sparsity: {}".format(activation_sparsity), sum(activation_sparsity)/len(activation_sparsity))
print("Dense: {}".format(dense), sum(dense)/len(dense))
print("MACs: {}".format(macs), sum(macs)/len(macs))
print("ACs: {}".format(acs), sum(acs)/len(acs))
print("R2: {}".format(r2), sum(r2)/len(r2))
# Footprint: [20824, 20824, 20824, 33496, 33496, 33496]
# Connection sparsity: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
# Activation sparsity: [0.7068512007122443, 0.7274494314849341, 0.6142621034584272, 0.6290474755671983, 0.6793054885963405, 0.6963649652600741] 0.6755467775132032
# Dense: [4702.261627687736, 4701.8430499148435, 4699.549582947173, 7773.2197567257945, 7771.01773105288, 7772.632844051291] 6236.754098729952
# MACs: [4306.322415210456, 3595.209672287623, 3607.261044176707, 5851.9819915795315, 5995.014802029395, 6462.786839756449] 4969.76279417336
# ACs: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 0.0
# R2: [0.6327020525932312, 0.5241347551345825, 0.6216747164726257, 0.5727078914642334, 0.4745999276638031, 0.6272222995758057] 0.5755069404840469
Hi Ajay, I noticed that your LSTMCell forward call does not include the (h, c) in the inputs. Based on the documentation, if these are not included, I believe that the recurrent state of the LSTM is not tracked at all, and essentially the LSTM block is just an MLP-type transform. I may be wrong on this, though.
Regardless, note that all of our other LSTM examples use the forward convention for the LSTMCell hx, cx = rnn(input[i], (hx, cx))
, and not just hx, cx = rnn(input[i])
.
By making additions to your model definition shown in the below code block, there is no longer a harness runtime error:
class LSTM(nn.Module):
def __init__(self, input_dim):
super(LSTM, self).__init__()
self.input_dim = input_dim
self.output_dim = 2
self.lstm = nn.LSTMCell(self.input_dim, 64)
self.fc1 = nn.Linear(64, 32)
self.fc2 = nn.Linear(32, 16)
self.fc3 = nn.Linear(16, self.output_dim)
self.layernorm0 = nn.LayerNorm(self.input_dim)
self.layernorm1 = nn.LayerNorm(32)
self.layernorm2 = nn.LayerNorm(16)
self.relu = nn.ReLU()
self.dropout = nn.Dropout(0.3)
self.bin_window_time = 0.2
self.sampling_rate = 0.004
self.bin_window_size = int(self.bin_window_time / self.sampling_rate)
self.register_buffer("data_buffer", torch.zeros(1, self.input_dim).type(torch.float32), persistent=False)
self.h = None
self.c = None
def single_forward(self,x):
x = x.unsqueeze(0)
x = self.layernorm0(x)
self.h, self.c = self.lstm(x, (self.h, self.c))
out = self.relu(self.h)
out = self.layernorm1(self.relu(self.fc1(out)))
out = self.dropout(out)
out = self.layernorm2(self.relu(self.fc2(out)))
out = self.fc3(out)
return out
def forward(self, x):
predictions = []
self.h = torch.zeros(1, 64).to(x.device)
self.c = torch.zeros(1, 64).to(x.device)
seq_length = x.shape[0]
for seq in range(seq_length):
current_seq = x[seq, :, :]
self.data_buffer = torch.cat((self.data_buffer, current_seq), dim=0)
if self.data_buffer.shape[0] <= self.bin_window_size:
predictions.append(torch.zeros(1, self.output_dim).to(x.device))
else:
# Only pass input into model when the buffer size == bin_window_size
if self.data_buffer.shape[0] > self.bin_window_size:
self.data_buffer = self.data_buffer[1:, :]
# Accumulate
spikes = self.data_buffer.clone()
acc_spikes = torch.sum(spikes, dim=0)
pred = self.single_forward(acc_spikes)
predictions.append(pred)
predictions = torch.stack(predictions).squeeze(dim=1)
return predictions
The harness should be able to support the case where (h, c) is not passed into the LSTMCell, so this is still an issue. But I recommend that you include (h, c) in the inputs.
Aah, I see. I read somewhere in the documentation that LSTMs by default initialize their hidden and cell states to a tensor of 0s, that's why I didn't explicitly add it. Thanks a lot!!
Also will I have to retrain my models with these changes incorporated? I just changed the model but passed the same weights I had before the explicit h and c definition and the neurobench benchmarks are running fine.
My guess is that you will need to retrain the model, as it is now tracking recurrent state and it wasn't before. I suggest that you take out all of the metrics except the R2 workload metric and first verify you are getting the expected accuracy before considering the compute complexity.
Alright thanks a lot!
TODO: support synops for RNNCells which do not use recurrent input
I have trained a Recurrent network using an LSTMCell and MLP layers. But when I load the model and the weights for running the benchmark, I get "RuntimeError: output with shape [256] doesn't match the broadcast shape [256, 256]". Tracing it backwards, it originates from the utils.py file on line 291 (out += biases). On printing the shapes of out and biases, I got [256] and [256, 1] respectively. Squeezing out the 2nd dimension from biases resolves the issue, but I am unsure whether there is a mistake with the benchmark code or with how my model is defined. I faced a similar issue on using a GRUCell. Can I please get some help?