ChenRocks / fast_abs_rl

Code for ACL 2018 paper: "Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting. Chen and Bansal"
MIT License
625 stars 186 forks source link

Decoding stuck at the 1320th json input #57

Closed timoderbeste closed 5 years ago

timoderbeste commented 5 years ago

Hi! I am trying to summarize some text files (not from the cnn daily mail dataset) using your decode_full_model.py. When the number of files is not very big, i.e. around 1000, it was working perfectly fine. However, because I have in total around 1 million files to summarize, the decoding process got stuck at the 1320th file. I tried to restart the decode multiple times and each time it was stuck at the same file. You can see the outputs from the screenshot below. image

I am wondering what could cause it. I did not modify the code but instead pre process my files so that they have the same structures as the one suggested on the website.

nickluijtgaarden commented 5 years ago

Hi!

Decoding seems to have some issues at beam > 1, because when summaries have

20 sentences, the reranking needs to do too many calculations and just gets stuck.

My solution was too only take the first 20 sentences for the summary and skip the last ones when decoding. On holiday right now, but I can share my code later if you want.

Kind regards,

Nick

Op di 30 jul. 2019 09:45 schreef Timo Wang notifications@github.com:

Hi! I am trying to summarize some text files (not from the cnn daily mail dataset) using your decode_full_model.py. When the number of files is not very big, i.e. around 1000, it was working perfectly fine. However, because I have in total around 1 million files to summarize, the decoding process got stuck at the 1320th file. I tried to restart the decode multiple times and each time it was stuck at the same file. I am wondering what could cause it. I did not modify the code but instead pre process my files so that they have the same structures as the one suggested on the website.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ChenRocks/fast_abs_rl/issues/57?email_source=notifications&email_token=AEHVK6NYJTTB5KH4YXOAXC3QCBHXRA5CNFSM4IH5EXK2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HCKIPHA, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHVK6JFTKYEJP5IIBBLZK3QCBHXRANCNFSM4IH5EXKQ .

timoderbeste commented 5 years ago

Thank you for your response!

I checked the config I was using and can confirm that my beam was set to be 1. However, it can be an issue that the documents to summarize can be long. Could you maybe suggest where I can modify the code so that only the first 20 sentences are taken for summary? I will give a it a try then.

Thanks again!

nickluijtgaarden commented 5 years ago

You can check out my branch here:

https://git.science.uu.nl/n.vandeluijtgaarden/legal-text-summarization/tree/develop/models/chen_bansal_2018

It is somewhere in the decoding functions that I made the changes. Cheers!

di 30 jul. 2019 17:24 schreef Timo Wang notifications@github.com:

Thank you for your response!

I checked the config I was using and can confirm that my beam was set to be 1. However, it can be an issue that the documents to summarize can be long. Could you maybe suggest where I can modify the code so that only the first 20 sentences are taken for summary? I will give a it a try then.

Thanks again!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ChenRocks/fast_abs_rl/issues/57?email_source=notifications&email_token=AEHVK6NHWM7CN5JXGAG7SP3QCC5RVA5CNFSM4IH5EXK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3FPUWY#issuecomment-516618843, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHVK6OA3FJ6VWAWDEFCJOTQCC5RVANCNFSM4IH5EXKQ .

timoderbeste commented 5 years ago

Great! Thanks a lot!

Holen Sie sich Outlook für Androidhttps://aka.ms/ghei36


From: nick notifications@github.com Sent: Tuesday, July 30, 2019 5:27:13 PM To: ChenRocks/fast_abs_rl fast_abs_rl@noreply.github.com Cc: Timo Wang ntwang1994@gmail.com; Author author@noreply.github.com Subject: Re: [ChenRocks/fast_abs_rl] Decoding stuck at the 1320th json input (#57)

You can check out my branch here:

https://git.science.uu.nl/n.vandeluijtgaarden/legal-text-summarization/tree/develop/models/chen_bansal_2018

It is somewhere in the decoding functions that I made the changes. Cheers!

di 30 jul. 2019 17:24 schreef Timo Wang notifications@github.com:

Thank you for your response!

I checked the config I was using and can confirm that my beam was set to be 1. However, it can be an issue that the documents to summarize can be long. Could you maybe suggest where I can modify the code so that only the first 20 sentences are taken for summary? I will give a it a try then.

Thanks again!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ChenRocks/fast_abs_rl/issues/57?email_source=notifications&email_token=AEHVK6NHWM7CN5JXGAG7SP3QCC5RVA5CNFSM4IH5EXK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3FPUWY#issuecomment-516618843, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHVK6OA3FJ6VWAWDEFCJOTQCC5RVANCNFSM4IH5EXKQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ChenRocks/fast_abs_rl/issues/57?email_source=notifications&email_token=ABYCOKBMK43DQAIC7H6G4ULQCC54DA5CNFSM4IH5EXK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3FPZ6Y#issuecomment-516619515, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABYCOKAA4CLSWKWUQSDXRJ3QCC54DANCNFSM4IH5EXKQ.

ChenRocks commented 5 years ago

Thanks for the solution.