dmmiller612 / bert-extractive-summarizer

Easy to use extractive text summarization with BERT
MIT License
1.39k stars 305 forks source link

How can i apply your code for Chinese? #45

Open shaofengzeng opened 4 years ago

shaofengzeng commented 4 years ago

Excuse me, can i use your code for Chinese...

dmmiller612 commented 4 years ago

The only limitation right now for Chinese is that you would need a Bert Model and tokenizer that uses Chinese. If you have both the tokenizer and model, you can easily pass it in for summarization.

shaofengzeng commented 4 years ago

OK, thanks

1615070057 commented 4 years ago

打扰一下,我可以用您的中文代码...

Hello, is the project about the application of ‘bert-extractive-summarizer’ applied to the Chinese abstract successful? I do n’t know how to modify it. I would like to ask.

BIRlz commented 4 years ago

OK, thanks

Have you ever tested this model on a Chinese dataset? It didn't work on my dataset and outputs nothing

dmmiller612 commented 4 years ago

It would need a Chinese based bert model. I am not sure if the bert-multilingual model supports Chinese or not. This would need to be in the form of a huggingface transformer.

ttxs69 commented 4 years ago

I have tried using bert-base-chinese model,but it outputs nothing. this is my code:

from transformers import *

# Load model, model config and tokenizer via Transformers
custom_config = AutoConfig.from_pretrained('bert-base-chinese')
custom_config.output_hidden_states=True
custom_tokenizer = AutoTokenizer.from_pretrained('bert-base-chinese')
custom_model = AutoModel.from_pretrained('bert-base-chinese', config=custom_config)

from summarizer import Summarizer

body = '这是一个测试句子'
model = Summarizer(custom_model=custom_model, custom_tokenizer=custom_tokenizer)
model(body)
ttxs69 commented 4 years ago

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well. Thanks @dmmiller612

Bibabo-BUPT commented 4 years ago

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well. Thanks @dmmiller612

Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U

ttxs69 commented 3 years ago

I have solve the problem,the default spacy is using English as sentence segmentioner,just change it to Chinese,and it works well. Thanks @dmmiller612

Excuse me, where have you changed to use jieba rather than spacy? I can't find it, thank U

Sorry to reply so late. just change two lines code in sentence_handler.py https://github.com/dmmiller612/bert-extractive-summarizer/blob/f94c0243954171b2e5233d2624a8d2fcad1ea9ba/summarizer/sentence_handler.py#L3 change to

from spacy.lang.zh import Chinese

and https://github.com/dmmiller612/bert-extractive-summarizer/blob/f94c0243954171b2e5233d2624a8d2fcad1ea9ba/summarizer/sentence_handler.py#L8

change to

def __init__(self, language=Chinese):

and this code https://github.com/dmmiller612/bert-extractive-summarizer/issues/45#issuecomment-650879240 works well.

lmq990417 commented 3 years ago

@ttxs69 Why is the final output of the Chinese original text after I modify the Chinese model according to your steps? Urgently want to know, hope can reply!

jnkr36 commented 3 years ago

@ttxs69 Why is the final output of the Chinese original text after I modify the Chinese model according to your steps? Urgently want to know, hope can reply!

i just try and it can work after i follow the steps to change the two lines code, you can run step into model(body) for debug

lmq990417 commented 3 years ago

@ttxs69 Ok, thanks,I will try. If it is convenient, could you please send me a copy of the code you run? My email address is cnlimaoqian@163.com.

lmq990417 commented 3 years ago

@jnkr36 I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is cnlimaoqian@163.com! Thank you very much again

lmq990417 commented 3 years ago

@jnkr36 I came again ! I just have a question that if you've downloaded zh_core_web_sm before.

jnkr36 commented 3 years ago

@jnkr36 I came again ! I just have a question that if you've downloaded zh_core_web_sm before.

@jnkr36 I'm sorry that I read the wrong name this morning. First of all, thank you very much for replying to me. I'm a little urgent now, but I can't find the mistake, so I will try the method you said, at the same time if it is convenient, could you please send me a copy of the code you run? My email address is cnlimaoqian@163.com! Thank you very much again

sorry for late response. i have sent you my project. please check you email. any other questions, we can talk again.

FrontMage commented 3 years ago

Just for convenience, I forked the repo and modified it as the suggestion above, it works nicely.

pip install git+https://github.com/FrontMage/bert-extractive-summarizer.git
tuzcsap commented 3 years ago

@FrontMage Hello! I've installed your modified fork, transformers, spacy 3.0.0 and downloaded zh_core_web_sm, then tried to run model as in ttxs69 snippet, but model generates empty output on Chinese sentences. Could you, please, provide more details on your setup?

zhangsirf commented 2 years ago

cnlimaoqian@163.com

If it is convenient, could you please send me a copy of the code you run? My email address is zhangf0308@gmail.com thanks

zhangsirf commented 2 years ago

@ttxs69 为什么我按照你的步骤修改了中文模型后最终输出的是中文原文? 急想知道,望能回复!

我只是尝试,在我按照步骤更改两行代码后它可以工作,您可以运行 step into model(body) 进行调试

If it is convenient, could you please send me a copy of the code you run? My email address is zhangf0308@gmail.com thanks

ilingen commented 2 years ago

For the outputs is original text, I just found out that you need to change every sentence in your long text to a Chinese period.