Open ghost opened 5 years ago
Hi, the value of f[key]
is a numpy array of (seq_len, dim)
(If you use the recently patch and output all the layers, it will be (n_layer, seq_len, dim)
). You can get embeddings for each word by numpy.split
along the seq_len
dimension.
So the sentence embedding is not averaged, I understand now. However, in f[Key]
, the key should be the sentence itself, right?
Another problem that I mentioned in my issue is regarding the input format, I suspect that I am doing something wrong because when I print the length of f.keys()
, it returns 1 even that my input contains more than one sentence. So this loop is executed only once and treat all my sentences as a single one.
for key in list(f.keys()):
print(key)
Am I doing something wrong?
The
So the sentence embedding is not averaged, I understand now. However, in f[Key] , the key should be the sentence itself, right?
Yes, the key should be the sentence itself.
Am I doing something wrong?
Please check if your input file follows the conll format (https://github.com/HIT-SCIR/ELMoForManyLangs#use-elmoformanylangs-in-command-line) and specify the input format as conll
Hi,
I am struggling to get the embedding for individual words. I used this command:
python -m elmoformanylangs test --input_format conll --input input.conllu --model ar.model --output_prefix ./output/ --output_format hdf5 --output_layer -1
And it dumbs hdf5 encoded onto the disk, as said. However, as far as I understand, the file encoded a dict where the key is tab speerated sentence, and the value is its representation.
But when I print the key:
I can see that f.keys() contain only a one string key of all sentences in the input file. 1) Why? And how to get individual sentence representation? 2) How to get individual word representation?
This is example of my input with 2 sentences :