Open GoogleCodeExporter opened 8 years ago
Thanks, I fixed the second part (the missing download of
questions-phrases.txt). However, I don't know what the first problem is about -
this part of the script runs OK for me.
Original comment by tmiko...@gmail.com
on 15 Sep 2014 at 9:23
1. Is your shell case-insensitive? Also, does it implicitly add the .tar.gz
suffix?
You download UMBC-webbase-corpus and extract umbc_webbase_corpus.tar.gz.
2. The corpus contains two types of files - plain txt (.txt) and parsed files
(.possf2). I assume you are only interested in the txt files, so you want to
iterate over these files only.
Original comment by roys...@gmail.com
on 16 Sep 2014 at 8:30
I just noticed that when downloading
http://ebiquity.umbc.edu/redirect/to/resource/id/351/UMBC-webbase-corpus
through my browser I also get umbc_webbase_corpus.tar.gz, as in the script.
However, when I download it using wget, I get UMBC-webbase-corpus. This might
explain the difference. And I also noticed you also handle the txt files only,
so that's cool.
Original comment by roys...@gmail.com
on 17 Sep 2014 at 8:25
I get umbc_webbase_corpus.tar.gz when using wget, so the issue must be in
something else. If more people will have the same problem as you, I may have to
update the script and give the output file an exact name.
Original comment by tmiko...@gmail.com
on 17 Sep 2014 at 5:48
Original issue reported on code.google.com by
roys...@gmail.com
on 15 Sep 2014 at 6:42Attachments: