geek-ai / Texygen

A text generation benchmarking platform
MIT License
863 stars 203 forks source link

When python main.py -g leakgan, there have a BUG and the data/shi.txt have a encoding BUG #15

Closed cyzLoveDream closed 6 years ago

cyzLoveDream commented 6 years ago

Traceback (most recent call last): File "main.py", line 85, in parse_cmd(sys.argv[1:]) File "main.py", line 67, in parse_cmd gan = set_gan(opt_arg['-g']) File "main.py", line 26, in set_gan gan = Gan() File "E:\all_code\aspectLevel\Texygen\models\leakgan\Leakgan.py", line 64, in init self.sequence_length = FLAGS.length File "D:\Anacond\lib\site-packages\tensorflow\python\platform\flags.py", line 84, in getattr wrapped(_sys.argv) File "D:\Anacond\lib\site-packages\absl\flags_flagvalues.py", line 630, in call name, value, suggestions=suggestions) absl.flags._exceptions.UnrecognizedFlagError: Unknown command line flag 'g'

cyzLoveDream commented 6 years ago

Traceback (most recent call last): File "main.py", line 85, in parse_cmd(sys.argv[1:]) File "main.py", line 73, in parse_cmd gan_func(opt_arg['-d']) File "E:\all_code\aspectLevel\Texygen\models\textGan_MMD\Textgan.py", line 328, in train_real wi_dict, iw_dict = self.init_real_trainng(data_loc) File "E:\all_code\aspectLevel\Texygen\models\textGan_MMD\Textgan.py", line 290, in init_real_trainng self.sequence_length, self.vocab_size = text_precess(data_loc) File "E:\all_code\aspectLevel\Texygen\utils\text_process.py", line 75, in text_precess train_tokens = get_tokenlized(train_text_loc) File "E:\all_code\aspectLevel\Texygen\utils\text_process.py", line 50, in get_tokenlized for text in raw: UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 2: illegal multibyte sequence

Yaoming95 commented 6 years ago

please refer to https://github.com/geek-ai/Texygen/issues/14

PualTorresZhang commented 6 years ago

Hello @cyzLoveDream ,Did you solved the first question? I met the same problem. If you did, please tell me.

ck37 commented 6 years ago

I also ran into this problem, which seems to be due to tensorflow parsing the command line flags within the leakgan code.

I solved it by modifying main.py to run leakgan by default, so that I wouldn't have to specify the -g leakgan on the command line:

        if not '-g' in opt_arg.keys():
            #print('unspecified GAN type, use MLE training only...')
            #gan = set_gan('mle')
            print('unspecified GAN type, use leak GAN training only...')
            gan = set_gan('leakgan')
PualTorresZhang commented 6 years ago

This is really helpful thank you!