awslabs / amazon-sagemaker-workshop

Amazon SageMaker workshops: Introduction, TensorFlow in SageMaker, and more
Apache License 2.0
381 stars 211 forks source link

Security issue in "Image classification transfer learning" notebook #4

Closed seporaitis closed 5 years ago

seporaitis commented 5 years ago

Hi,

I would like to bring to your attention that notebooks/Image-classification-transfer-learning.ipynb in the section "Fine tuning image classification model" there is this paragraph:

The image classification algorithm can take two types of input formats. The first is a recordio format, and the other is a lst format. Files for both these formats are available at http://data.dmlc.ml/mxnet/data/caltech-256/. In this example, we will use the recordio format for training and use the training/validation split specified here.

Note the link to http://data.dmlc.ml/mxnet/data/caltech-256/ and the same link under text "specified here".

The domain data.dmlc.ml seems to not be owned by whoever was providing the data in recordio format anymore and looks like it is a parked domain available to buy. Further more, and I think what is more dangerous, is that on some occasions instead of showing the "Buy domain" page it redirects to this page instead:

Screenshot 2019-06-21 at 11 38 36

Upon clicking it then opens a new window and requests the user to install an unknown Chrome extension:

Screenshot 2019-06-21 at 11 38 58

Needless to say, I did not continue on that path.

I could not find an alternative source for the dataset in recordio format, but there is a Kaggle mirror of the raw dataset: https://www.kaggle.com/jessicali9530/caltech256

rabowskyb commented 5 years ago

Thanks for bringing this to our attention! We've removed that dead link.

In the code cell, data is downloaded from repositories managed by the MXNet team. These are in active use, see for example https://mxnet.incubator.apache.org/versions/master/faq/finetune.html.

seporaitis commented 5 years ago

Thank you!