Managing files from AWS s3

Libisch commented 7 years ago

Overview

Images scraped using Scrapy (from Bagnowka for example) are uploaded to AWS s3, not to google storage (were all the photos are currently stored). Even though the correct main_image_url is specified. in /bhs_api/item.py:

def get_image_url(image_id, bucket):
    return  'https://storage.googleapis.com/{}/{}.jpg'.format(bucket, image_id)

Expected

The main image url should not be referring to gcs if other storage is specified inmain_image_url.

"main_image_url": "https://s3-us-west-2.amazonaws.com/bagnowka-scraped/full/95eb0e2ccf2a62ad46939994718ba150.jpg"

Actual

The API returns PictureId appended to the storage.googleapis uri:

"main_image_url": "https://storage.googleapis.com/bhs-flat-pics/95eb0e2ccf2a62ad46939994718ba150.jpg"

#131

Libisch commented 7 years ago

@OriHoch

OriHoch commented 7 years ago

@Libisch following our discussion -

bagnowka items should have some kind of indication that we should use their url attribute as-is
item.py enrich_item function should check this attribute

Libisch commented 7 years ago

Done. see #165

OriHoch commented 7 years ago

@Libisch if it's done, please assigne to @TheGrandVizier for QA (if needed) if it doesn't require QA or it's very technical you should assign to yourself for QA

Beit-Hatfutsot / dbs-back