Hello, I'm using fast text fro research project, using Unsupervised word embedded Alignments for comparing two different corpus of English language texts, labelled with "-a" and "_b" (for example, in corpus a the word "Commission" is called "Commission_a", while in corpus B is called "Commission_b") using a Lexicon of 100 safe correspondences. Corpus A counts 13308 words, while Corpus B counts 8237 words. Using the algorithm of "unsup_align.py" in a sh script like this:
`echo "Unsupervised Example based on the 1-dimension clusters alignment"
if [ ! -d Aligned_Models/ ]; then
mkdir -p Aligned_Models;
fi
Running the script, after a successful computation of the initial mapping with convex relaxation, it raises an index error in the matrix corresponding of the first matrix:
`Unsupervised Example based on the 1-dimension clusters alignment
Wasserstein Procrustes
Loading vectors from Embedding_Models/corpus_a_clean.txt
13308 word vectors loaded
Loading vectors from Embedding_Models/corpus_b_clean.txt
8237 word vectors loaded
Coverage of source vocab: 1.0000
Computing initial mapping with convex relaxation...
6.556953872272116
Done [180 sec]
Computing mapping with Wasserstein Procrustes...
Traceback (most recent call last):
File "../../GitHub/fastText/alignment/unsup_align.py", line 98, in
nepoch=args.nepoch, reg=args.reg, nmax=args.nmax)
File "../../GitHub/fastText/alignment/unsup_align.py", line 45, in align
xt = X[np.random.permutation(nmax)[:bsz], :]
IndexError: index 16926 is out of bounds for axis 0 with size 13308`
What's going wrong? Why is trying to access index 16926 which is obviously out-of-bounds for both the Corpus? Thank you for your help
Hello, I'm using fast text fro research project, using Unsupervised word embedded Alignments for comparing two different corpus of English language texts, labelled with "-a" and "_b" (for example, in corpus a the word "Commission" is called "Commission_a", while in corpus B is called "Commission_b") using a Lexicon of 100 safe correspondences. Corpus A counts 13308 words, while Corpus B counts 8237 words. Using the algorithm of "unsup_align.py" in a sh script like this:
`echo "Unsupervised Example based on the 1-dimension clusters alignment"
if [ ! -d Aligned_Models/ ]; then mkdir -p Aligned_Models; fi
lexicon=./feedbacks/7937377/analysis/lexab.txt model_src=Embedding_Models/corpus_a_clean.txt model_tgt=Embedding_Models/corpus_b_clean.txt output_src=Aligned_Models/corpus_a_clean_uns-aligned output_tgt=Aligned_Models/corpus_b_clean_uns-aligned
python3 ../../GitHub/fastText/alignment/unsup_align.py --model_src "${model_src}" --model_tgt "${model_tgt}" --lexicon "${lexicon}" \ --output_src "${output_src}" --output_tgt "${output_tgt}" --lr 25 --niter 10`
Running the script, after a successful computation of the initial mapping with convex relaxation, it raises an index error in the matrix corresponding of the first matrix:
`Unsupervised Example based on the 1-dimension clusters alignment
Wasserstein Procrustes
Loading vectors from Embedding_Models/corpus_a_clean.txt 13308 word vectors loaded Loading vectors from Embedding_Models/corpus_b_clean.txt 8237 word vectors loaded Coverage of source vocab: 1.0000
Computing initial mapping with convex relaxation... 6.556953872272116 Done [180 sec]
Computing mapping with Wasserstein Procrustes... Traceback (most recent call last): File "../../GitHub/fastText/alignment/unsup_align.py", line 98, in
nepoch=args.nepoch, reg=args.reg, nmax=args.nmax)
File "../../GitHub/fastText/alignment/unsup_align.py", line 45, in align
xt = X[np.random.permutation(nmax)[:bsz], :]
IndexError: index 16926 is out of bounds for axis 0 with size 13308`
What's going wrong? Why is trying to access index 16926 which is obviously out-of-bounds for both the Corpus? Thank you for your help