Closed briandw closed 6 years ago
The error message you have comes from this:
import numpy as np
embeddings = [np.zeros((64, 4096)), np.zeros((64, 4096))]
embeddings = np.vstack(embeddings) # no error
embeddings = [np.zeros((64, 4096)), np.zeros((64, 4096)), np.zeros((64, 3412))]
embeddings = np.vstack(embeddings) # error
# -> ValueError: all the input array dimensions except for the concatenation axis must match exactly
For some reasons, one of the element in "embeddings" is not of size (batch_size=128, emb_dim=4096). So there must be one or more element of size different than (128, 4096).
1) Just before the error in line 209, could you print the shape of each element in embeddings?
for batch in embeddings:
print(batch.shape)
to see if we can spot the element with the wrong size.
2) What is in "sentences"? Can you check that you don't have an empty sentence?
3) what is the length of "sentences" ?
4) Could you update pytorch to a more recent version and see if you still have the issue?
Thanks for the quick response.
I believe that I'm on the latest torch version of 0.1.12_1. Is there a later version?
sentences length is is 9815 and there are no 0 length sentences in the array.
This is the output from just before the line 209:
Nb words kept : 128201/130068 (98.56 %)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 64, 4096)
(1, 23, 4096)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-3d88dd6254e6> in <module>()
1 tmp = sentences[:128]
----> 2 model.encode(sentences, tokenize=False, verbose=True)
/home/brian/InferSent/encoder/models.py in encode(self, sentences, bsize, tokenize, verbose)
210 for batch in embeddings:
211 print(batch.shape)
--> 212 embeddings = np.vstack(embeddings)
213
214 # unsort
/home/brian/anaconda3/envs/py2/lib/python2.7/site-packages/numpy/core/shape_base.pyc in vstack(tup)
235
236 """
--> 237 return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
238
239 def hstack(tup):
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Oh ok I get it. Can you try to change the line in models.py here: https://github.com/facebookresearch/InferSent/blob/master/encoder/models.py#L67
emb = torch.max(sent_output, 0)[0]
into:
emb = torch.max(sent_output, 0)[0].squeeze(0)
and see if this works then?
That's working now. Thanks! I wonder why this didn't show up before?
@briandw So this is an issue linked to the change of policy in pytorch functions such as max, mean, sum etc.
If you have a tensor of size (say) (23, 128, 4096). If you take the torch.max (or torch.mean ..) over the first dimension, then you get a tensor of size:
(128, 4096) for recent versions of pytorch (1, 128, 4096) for old versions of pytorch
So it means your version of pytorch is too old. I will update the requirement part in the README, and add an exception in the models.py to handle this case.
Thanks
setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9815,) + inhomogeneous part. getting this error on embeddings = infersent.encode(sentences, bsize=128, tokenize=False, verbose=True) print('nb sentences encoded : {0}'.format(len(embeddings)))
I'm running the encoder/demo.ipynb notebook with Python2.7 and PyTorch '0.1.12_1' When running the line
embeddings = model.encode(sentences, bsize=128, tokenize=False, verbose=True)
I get the following error:
Not sure if this is related but loading the model produces a warning