Closed GustavoJE closed 3 years ago
Thanks for trying newsqa! Right we should update the instructions about the .tar.gz.
Can you share the full Docker command that you ran AND where you run it from (it should be from the root of the repo)?
I suspect that there could be a problem with the mounting (-v
parameter). Try:
docker run --rm -it -v ${PWD}:/usr/src/newsqa --name newsqa maluuba/newsqa /bin/bash --login -c 'ls /usr/src/newsqa/maluuba/newsqa'
You can also try to set up the -v
parameter explicitly instead of using ${PWD}
.
Oh I think I see what happened with the download, there are a few options and you're not really required to do the tar.gz option:
oh i didn't see that option. So i should have put the tar.gz file instead of the .csv in maluuba/newsqa folder? will try it as soon as i can and update this issue. Thank you!
It should work with just the .csv.
I just ran the command from the repo root like this:
docker run --rm -it -v "${PWD}:/usr/src/newsqa" --name newsqa maluuba/newsqa /bin/bash --login -c 'ls /usr/src/newsqa/maluuba/newsqa'
(note the added quotes on the -v argument)
and got:
TokenizerSplitter.java cnn.tgz data_generator.py dev_story_ids.csv simplify.py split_dataset.py stories_requiring_two_extra_newlines.csv test_story_ids.csv tokenize_dataset.py __init__.py cnn_stories.tgz data_processing.py newsqa-data-v1.csv span_utils.py stories_requiring_extra_newline.csv stories_to_decode_specially.csv tests train_story_ids.csv
will now try defining explicitly
That looks good, I see "newsqa-data-v1.csv" there so maybe the double quotes helped? Maybe you have a space in your ${PWD}? Try to run the original docker run
command that gave you issues but use the double quotes for the -v
parameter.
if i use the original command with the double quotes then i get the errors posted first, the test complains that it can't find newsqa-data-v1.csv
. I forgot to mention that
Weird. Idk why this is happening. Can you try with the newsqa-data-v1.tar.gz? It also goes in the maluuba/newsqa folder.
if i add the tar.gz file it works as expected. It was my mistake, sorry. Thanks again!
That's weird that it didn't work with the .csv but I'm glad it works for you now!
So i download the csv from Microsoft's site, which btw is not a tar.gz, then i download "cnn.tgz" and "cnn_stories.tgz" and put them into maluuba/newsqa folder with "newsqa-data-v1.csv". Then i build the docker and finally run it. However i get the following error:
`EE
ERROR: setUpClass (maluuba.newsqa.tests.test_tokenize.TestNewsQaTokenize)
Traceback (most recent call last): File "/usr/src/newsqa/maluuba/newsqa/tests/test_tokenize.py", line 32, in setUpClass NewsQaDataset().dump(path=combined_data_path) File "/usr/src/newsqa/maluuba/newsqa/data_processing.py", line 80, in init "\n See the README in the root of this repo for more details." % dataset_path) Exception:
/usr/src/newsqa/maluuba/newsqa/newsqa-data-v1.csv
was not found. For legal reasons, you must first accept the terms and download the dataset from https://msropendata.com/datasets/939b1042-6402-4697-9c15-7a28de7e1321 See the README in the root of this repo for more details.====================================================================== ERROR: setUpClass (maluuba.newsqa.tests.test_newsqa.TestNewsQa)
Traceback (most recent call last): File "/usr/src/newsqa/maluuba/newsqa/tests/test_newsqa.py", line 36, in setUpClass cls.newsqa_dataset = NewsQaDataset() File "/usr/src/newsqa/maluuba/newsqa/data_processing.py", line 80, in init "\n See the README in the root of this repo for more details." % dataset_path) Exception:
/usr/src/newsqa/maluuba/newsqa/newsqa-data-v1.csv
was not found. For legal reasons, you must first accept the terms and download the dataset from https://msropendata.com/datasets/939b1042-6402-4697-9c15-7a28de7e1321 See the README in the root of this repo for more details.Ran 0 tests in 0.001s
FAILED (errors=2) `