Closed anirudhprabhu closed 4 years ago
It is a difference in how the tensor is stored in memory. reshape or a call to .contiguous() before view should fix. I made the latter change, thanks for raising the issue!
It is a difference in how the tensor is stored in memory. reshape or a call to .contiguous() before view should fix. I made the latter change, thanks for raising the issue!
@cezannec I am confused, can it be explain what is going on underneath? why does it make a difference?
references to check that might be helpful:
@anirudhprabhu Having the same issue, however the conclusion isnt really want I wanted. The reason why is:
`view` function with `reshape` as suggested in the error and it works. Though I am still not
sure of the difference between the two functions in this context.
The reason reshape
or contiguous
work is that unlike view, they will copy the tensor iteslf. https://pytorch.org/docs/stable/generated/torch.Tensor.view.html Sooo if you're trying to avoid duplicating tensors, this isnt going to be an optimal solution for you. If you're fine with possibly duplicating tensors, then you might as well directly use reshape
or contiguous
.
For me, I want to avoid tensor duplication. I think this issue occurs when you're making views of views, or views of indexed tensors (or especially indexed tensors since you're literally filtering the shape).
I'm mainly posting this for others to note.
TLDR: reshape
or contiguous
work because they copy the tensor sometimes. Where a view gauruntees that when its called, the tensor wont be duplicated, but ofcourse... sometimes you must duplicate the tensor :/
Hi,
based on suggestion of the error I used reshape() instead of view(). Also, I tested contiguous().view(-1) as well, but still I face the same error. Would anyone assist in it?
this is my error: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
from tqdm import tqdm
tokenizer = T5Tokenizer.from_pretrained("t5-base")
model = T5ForConditionalGeneration.from_pretrained("t5-base")
with open("question_answer.txt", "r") as file:
text = file.read()
questions = text.split("\n")[:-1]
answers = text.split("\n")[1:]
# Define the maximum number of lines for training
max_lines = 50
# Create a progress bar
progress_bar = tqdm(total=min(max_lines, len(questions)), desc="Processing")
batch_size = 4 # Adjust the batch size according to your memory capacity
inputs = []
target_texts = []
loss_values = [] # Store the losses for each batch
optimizer_values = [] # Store the optimizer values for each batch
for i, question in enumerate(questions[:max_lines]):
if i % batch_size == 0 and i != 0:
tokenized_inputs = tokenizer.batch_encode_plus(
inputs,
padding="longest",
truncation=True,
return_tensors="pt"
)
tokenized_targets = tokenizer.batch_encode_plus(
target_texts,
padding="longest",
truncation=True,
return_tensors="pt"
)
input_ids = tokenized_inputs["input_ids"]
attention_mask = tokenized_inputs["attention_mask"]
target_ids = tokenized_targets["input_ids"]
decoder_attention_mask = tokenized_targets["attention_mask"]
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
loss_fn = torch.nn.CrossEntropyLoss()
optimizer.zero_grad()
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask,
decoder_input_ids=target_ids[:, :-1],
decoder_attention_mask=decoder_attention_mask[:, :-1],
labels=target_ids[:, 1:]
)
lm_logits = outputs.logits
loss = loss_fn(lm_logits.reshape(-1, lm_logits.size(-1)), target_ids[:, 1:].reshape(-1))
loss.backward()
optimizer.step()
loss_values.append(loss.item())
optimizer_values.append(optimizer.param_groups[0]['lr'])
inputs = []
target_texts = []
print(f"Batch {i//batch_size}, Loss: {loss.item()}")
input_text = question.format_map({'item': data.iloc[0]})
inputs.append(input_text)
target_texts.append(answers[i])
# Update the progress bar
progress_bar.update(1)
# Process the remaining batch
if inputs:
tokenized_inputs = tokenizer.batch_encode_plus(
inputs,
padding="longest",
truncation=True,
return_tensors="pt"
)
tokenized_targets = tokenizer.batch_encode_plus(
target_texts,
padding="longest",
truncation=True,
return_tensors="pt"
)
input_ids = tokenized_inputs["input_ids"]
attention_mask = tokenized_inputs["attention_mask"]
target_ids = tokenized_targets["input_ids"]
decoder_attention_mask = tokenized_targets["attention_mask"]
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
loss_fn = torch.nn.CrossEntropyLoss()
optimizer.zero_grad()
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask,
decoder_input_ids=target_ids[:, :-1],
decoder_attention_mask=decoder_attention_mask[:, :-1],
labels=target_ids[:, 1:]
)
lm_logits = outputs.logits
loss = loss_fn(lm_logits.reshape(-1, lm_logits.size(-1)), target_ids[:, 1:].reshape(-1))
loss_values.append(loss.item())
optimizer_values.append(optimizer.param_groups[0]['lr'])
hi @SudoSaba @cezannec am still facing the same issue replace() or contiguos() did not fix mine.
Hello,
I am very new to capsuleNetworks and Pytorch in general. Thank you for the detailed and easy to understand explanations. While I was trying to run the code I came across an error when I was trying to train a model.
I have not changed any part of the code yet. I wanted to run the code as it is, before trying different things. Can you help me understand why such an error was caused and how to fix it?
Thank you!
EDIT: I just replaced the
view
function withreshape
as suggested in the error and it works. Though I am still not sure of the difference between the two functions in this context.