gitabtion / SoftMaskedBert-PyTorch

🙈 An unofficial implementation of SoftMaskedBert based on huggingface/transformers.
MIT License
93 stars 17 forks source link

报错 解决不了 ,作者大大可以帮忙看看吗? #21

Open Oldsport-996 opened 2 years ago

Oldsport-996 commented 2 years ago

(gitabtion) F:\0code\gitabtion>python main.py --mode preproc Namespace(accumulate_grad_batches=16, batch_size=16, bert_checkpoint='bert-base-chinese', device=device(type='cpu'), epochs=10, gpu_index=0, hard_device='cpu', load_checkpoint=False , loss_weight=0.8, lr=0.0001, mode='preproc', model_save_path='checkpoint', warmup_epochs=8) preprocessing... Traceback (most recent call last): File "main.py", line 99, in main() File "main.py", line 63, in main preproc() File "F:\0code\gitabtion\src\data_processor.py", line 201, in preproc for item in read_data(get_abs_path('data')): File "F:\0code\gitabtion\src\data_processor.py", line 131, in read_data for line in f: UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 16: illegal multibyte sequence

我尝试修改了改了这个data_processed.py这个文件的129行,改为encoding未'utf-8'或者 ascii等都未成功,好难受,这是什么问题

gitabtion commented 2 years ago

这个仓库的数据处理脚本是有些问题,可以使用这个仓库 BertBasedCorrectionModels 处理数据后,再用本仓库训练

hongge778 commented 1 year ago

(gitabtion) F:\0code\gitabtion>python main.py --mode preproc Namespace(accumulate_grad_batches=16, batch_size=16, bert_checkpoint='bert-base-chinese', device=device(type='cpu'), epochs=10, gpu_index=0, hard_device='cpu', load_checkpoint=False , loss_weight=0.8, lr=0.0001, mode='preproc', model_save_path='checkpoint', warmup_epochs=8) preprocessing... Traceback (most recent call last): File "main.py", line 99, in main() File "main.py", line 63, in main preproc() File "F:\0code\gitabtion\src\data_processor.py", line 201, in preproc for item in read_data(get_abs_path('data')): File "F:\0code\gitabtion\src\data_processor.py", line 131, in read_data for line in f: UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 16: illegal multibyte sequence

我尝试修改了改了这个data_processed.py这个文件的129行,改为encoding未'utf-8'或者 ascii等都未成功,好难受,这是什么问题 兄弟最后问题怎么解决的?