dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

fix copy directly data #1432

Closed MoisesHer closed 3 years ago

MoisesHer commented 3 years ago

Description

This PR fixes the following issue found when getting SQuAD datasets on a MXNet docker container:

> nlp_data prepare_squad --version 1.1
Downloading /opt/mxnet/datasets/nlp/squad/train-v1.1.json from https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json...
30.3MiB [00:00, 30.5MiB/s]                                                                                                     
Downloading /opt/mxnet/datasets/nlp/squad/dev-v1.1.json from https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json...
4.85MiB [00:00, 57.8MiB/s]                                                                                                     
Traceback (most recent call last):
  File "/usr/local/bin/nlp_data", line 33, in <module>
    sys.exit(load_entry_point('gluonnlp', 'console_scripts', 'nlp_data')())
  File "gluon-nlp/src/gluonnlp/cli/data/__main__.py", line 37, in cli_main
    mod.main(sub_args)
  File "gluon-nlp/src/gluonnlp/cli/data/question_answering/prepare_squad.py", line 65, in main
    os.path.join(args.save_path, train_file_name))
OSError: [Errno 18] Invalid cross-device link: '/opt/mxnet/datasets/nlp/squad/train-v1.1.json' -> 'squad/train-v1.1.json'

Docker version 19.03.8, build afacb8b7f0

Checklist

Essentials

cc @dmlc/gluon-nlp-team, @sxjscience

sxjscience commented 3 years ago

@szha It looks that there are something wrong with the latest MXNet master. Lots of GluonNLP test cases have failed. Would it be related to the graph optimization PR?

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1432/fix_prepare_data_squad/index.html

sxjscience commented 3 years ago

@leezu @ZheyuYe To explain why we are changing from os.link to shutil.copyfile. We noticed some issues when running the experiment in the docker environment so we made the change. Also, SQuAD is not very large and it should be acceptable.

sxjscience commented 3 years ago

Should be unblocked temporarily by https://github.com/dmlc/gluon-nlp/pull/1433

sxjscience commented 3 years ago

@MoisesHer Would you try to merge the master?

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1432/fix_prepare_data_squad/index.html

codecov[bot] commented 3 years ago

Codecov Report

Merging #1432 (2ad77e0) into master (b9600b3) will decrease coverage by 0.20%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1432      +/-   ##
==========================================
- Coverage   85.51%   85.31%   -0.21%     
==========================================
  Files          53       53              
  Lines        6987     6987              
==========================================
- Hits         5975     5961      -14     
- Misses       1012     1026      +14     
Impacted Files Coverage Δ
src/gluonnlp/data/loading.py 78.11% <0.00%> (-5.29%) :arrow_down:
src/gluonnlp/data/tokenizers/yttm.py 81.89% <0.00%> (-0.87%) :arrow_down:
src/gluonnlp/data/tokenizers/subword_nmt.py 79.43% <0.00%> (+0.93%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b9600b3...2ad77e0. Read the comment docs.

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1432/fix_prepare_data_squad/index.html