Feeding the following real training dataset to a SeqGAN works perfectly:
X = np.random.randint(0, 20, (80, 20))
However, the following dataset with the same dimensionality but 6 symbols instead of 20 raises an error.
X = np.random.randint(0, 6, (80, 20))
In both cases, we used vocab_size = #unique symbols + 1, as suggested in text_process.text_precess(). Here is the corresponding traceback:
Traceback (most recent call last):
File "texygen/texygen.py", line 85, in train
gan_func(X)
File "texygen/models/seqgan/Seqgan.py", line 331, in train_real
self.evaluate()
File "texygen/models/seqgan/Seqgan.py", line 80, in evaluate
scores = super().evaluate()
File "texygen/models/Gan.py", line 55, in evaluate
score = metric.get_score()
File "texygen/utils/metrics/DocEmbSim.py", line 33, in get_score
return self.get_dis_corr()
File "texygen/utils/metrics/DocEmbSim.py", line 164, in get_dis_corr
return np.log10(corr / len(self.oracle_sim))
ZeroDivisionError: division by zero
Feeding the following real training dataset to a SeqGAN works perfectly:
However, the following dataset with the same dimensionality but 6 symbols instead of 20 raises an error.
In both cases, we used vocab_size = #unique symbols + 1, as suggested in text_process.text_precess(). Here is the corresponding traceback: