Imageomics / cautious-robot

Simple images from CSV downloader that runs and records checksums on downloaded image folder.
MIT License
2 stars 0 forks source link

Get image extensions when not present in `--img-name-col` column #3

Open thompsonmj opened 5 months ago

thompsonmj commented 5 months ago

The column used to write the filename may not have a file extension. A user might prefer to have images saved with extensions present in the filenames.

Another optional Boolean flag could be used to give the user the chance to specify something like:

  -e, --infer-extension  Infer the appropriate file extension if one is not present in the --img-name-col (default: False)

If this is switched on, some options mentioned in the discussion for #1 for doing this could be:

First, double check that there isn't a valid image extension in the --img-name-col to avoid writing a file with something like image.png.png (if there's an extension present and the user says -e, I can't imagine why they would have done so intentionally).

To detect an extension:

egrace479 commented 5 months ago

Another option would be response.headers['Content-Type']. For an image, this should return something along the lines of img/png, in which case a simple .split("/")[1] would give us the proper filetype.

thompsonmj commented 5 months ago

That sounds like a good first place to look. For MIME type image/jpeg, we might want the extension to instead be .jpg? Otherwise of all the image types, I think we can be pretty sure that .split("/")[1] will work to get a good extension.

Is response.headers['Content-Type'] guaranteed to always be there though? Probably rare, but might want a backup check in case it's absent.

johnbradley commented 5 months ago

Another way servers convey the appropriate filename is via content disposition: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition#as_a_response_header_for_the_main_body