Closed shahbazsyed closed 6 years ago
Hi, are you using Lua 5.2? This looks like the following issue in tds: https://github.com/torch/tds/issues/25. As there is no fix for this yet, I suggest switching to luajit for the time being.
@jgehring I switched back to luajit and I don't get this problem anymore. However the command is not able to find the files for -trainpref . Is the argument trainpref a path to the train folder which contains 2 files named as train.articles , train.summaries ? What is the use of this pref?
-{train,valid,test}pref are prefixes to files which end with the arguments of -sourcelang and -targetlang, e.g., -trainpref /home/$USER/data/train -sourcelang articles -targetlang summaries will look for the two files /home/$USER/data/train.articles and /home/$USER/data/train.summaries
On 12 May 2017 at 09:09, Syed Shahbaz Ahmed notifications@github.com wrote:
@jgehring https://github.com/jgehring I switched back to luajit and I don't get this problem anymore. However the command is not able to find the files for -trainpref . Is the argument trainpref a path to the train folder which contains 2 files named as train.articles , train.summaries ? What is the use of this pref?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/fairseq/issues/16#issuecomment-301118929, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGSfppDMEZ00C5KW3pD7BJg1RqcOCXdks5r5IQqgaJpZM4NXn4_ .
@jgehring It would be possibile to provide in the README
a text summarization example, starting from the provided dataset? Thanks.
You can pre-process abstractive summarization data in the same way as machine translation data. Just follow the steps for building the IWSLT example model in the README ( https://github.com/facebookresearch/fairseq#training-a-new-model).
On 16 May 2017 at 14:32, Loreto Parisi notifications@github.com wrote:
@jgehring https://github.com/jgehring It would be possibile to provide in the README a text summarization example, starting from the provided dataset? Thanks.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/fairseq/issues/16#issuecomment-301921981, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGSfjrtbihW_b4FU1w_arQ_accdBnSMks5r6hXTgaJpZM4NXn4_ .
@michaelauli Thank you. In that case I would have something like -sourcelang en -targetlang en
, but there is no example dataset for text summarization (like Gigaword dataset, Daily Mail dataset, CNN dataset, etc.) at this point to run a working example, right?
Sure there is, see the data provided by https://github.com/facebookarchive/NAMAS
-sourcelang and -targetlang refer to file extensions, see my comment above ( https://github.com/facebookresearch/fairseq/issues/16#issuecomment-301140815 )
On 16 May 2017 at 14:48, Loreto Parisi notifications@github.com wrote:
@michaelauli https://github.com/michaelauli Thank you. In that case I would have something like -sourcelang en -targetlang en, but there is no example dataset for text summarization (like Gigaword dataset, etc.) at this point to run a working example, right?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/fairseq/issues/16#issuecomment-301925686, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGSfpWle5xuvuQ_2M5QFA7d0oEc48DYks5r6hm6gaJpZM4NXn4_ .
@michaelauli so to be clear in the case of the Neural Attention Model for Abstractive Summarization
it was trained with the
LDC2012T21 Annotated English Gigaword
So to reach a comparable BLEU
score, it should be used the GigaWord
I guess. Why there is not pre-trained model for this (i.e. licensing issue, etc.)?
Thank you very much for your help.
yes, following the pre-processing in the NAMAS github project.
On 17 May 2017 at 01:32, Loreto Parisi notifications@github.com wrote:
@michaelauli https://github.com/michaelauli so to be clear in the case of the Neural Attention Model for Abstractive Summarization it was trained with the
LDC2012T21 Annotated English Gigaword
Thank you.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/fairseq/issues/16#issuecomment-302023022, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGSfvOZw3xCemhETYRwcLPwHqK_PqD7ks5r6rCEgaJpZM4NXn4_ .
Closing due to inactivity; please re-open if necessary.
Hi, I am trying to test this model for summarization task (en->en) . I am trying to preprocess my articles and summaries using fairseq preprocess. I get the following error
The command i use for preprocessing is
fairseq preprocess -sourcelang articles -targetlang summaries -trainpref train -validpref valid -testpref test -destdir data-bin/summarize
I have the following tokenized files in my directory : train.articles , train.summaries , valid.articles, valid.summaries, test.articles, test.summaries -> each containing a sentence per line
Can someone kindly let me know what am I missing here ?