allenai / sherlock

Code, data, models for the Sherlock corpus
Apache License 2.0
55 stars 7 forks source link

Forbidden for part of dataset #3

Closed simonjisu closed 2 years ago

simonjisu commented 2 years ago

Issue

Hi! Thank you for your great work. I am trying to explore your dataset(from Korea), However I cannot reach some of data url, especially in cs.standford.edu domain. I tried add headers in python request module:

import requests

session = requests.Session()
url = 'https://cs.stanford.edu/people/rak248/VG_100K/2364638.jpg'
response = session.get(url, headers={'User-Agent': 'Mozilla/5.0'})
response.status_code
# returns 403

How can I fix this issue? Let me attach the image when I try to access following link in the Chorme:

image

jmhessel commented 2 years ago

Hi @simonjisu

thanks for your interest in our work! I would recommend downloading the visual genome images locally rather than accessing them via the URLs. The images are available here: https://visualgenome.org/api/v0/api_home.html

Jack

jmhessel commented 2 years ago

(I will close for now because I think this addresses your question; feel free to re-open if I can be helpful further)

simonjisu commented 2 years ago

Thank you very much!😄