Mauville / MedCLIP

Medical image captioning using OpenAI's CLIP
53 stars 13 forks source link

Change all URLs to new CDN #7

Open Mauville opened 4 weeks ago

Mauville commented 4 weeks ago

Medpix has moved all their content off to a cdn

Old links looked like https://medpix.nlm.nih.gov/images/full/synpic52419.jpg

Now they look like https://d168r5mdg5gtkq.cloudfront.net/medpix/img/full/synpic17159.jpg

The dataset links need to be modified. It appears that a simple rename should work, but if the cdn is constantly changing, then this could become a reoccurring problem.

A simple fix for the scraper is adding the following line

filename = url.split("/")[-1]
    url= f"https://d168r5mdg5gtkq.cloudfront.net/medpix/img/full/{filename}"
    urllib.request.urlretrieve(url, f"/content/drive/Shareddrives/DeepLearning/data/output/{filename}")
Qasim-Latrobe commented 4 weeks ago

Thanks for the prompt response and guidance. Yes, I am able to download the madpix dataset 🙂