Closed yslin1995 closed 5 years ago
@matt-peters @PhilipMay
The output file is key-value, not an array. The keys are the sentence number cast as str. These lines probably don't iterate in deterministic order, similar to a dict:
with h5py.File('./ooooo.hdf5', 'r') as fin:
for i in fin:
...
Try
with h5py.File('./ooooo.hdf5', 'r') as fin:
for i in range(num_sentences):
print(fin[str(i)][...].shape)
The output file is key-value, not an array. The keys are the sentence number cast as str. These lines probably don't iterate in deterministic order, similar to a dict:
with h5py.File('./ooooo.hdf5', 'r') as fin: for i in fin: ...
Try
with h5py.File('./ooooo.hdf5', 'r') as fin: for i in range(num_sentences): print(fin[str(i)][...].shape)
Thanks so much, problem solved !
Hi, when I run the following code, I found that the order of sentences has changed after calling
dump_bilm_embeddings()
.For example, the lengths of all sentences in
111.txt
are[21, 11, 17, 16, 20, 14, 21, 23, 17, 17, 18]
, but after dumpping sentence embeddings, the shapes of sentence embeddings are[21, 11, 18, 17, 16, 20, 14, 21, 23, 17, 17]
. So you can see that the sentence with length 18 has been moved ahead.However, I didn't find any random or exchange operation codes in
dump_bilm_embeddings()
, so why is that?main.py
111.txt: