astropy / pyvo

An Astropy affiliated package providing access to remote data and services of the Virtual Observatory (VO) using Python.
https://pyvo.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
75 stars 52 forks source link

cachedataset extension derivation broken #552

Closed msdemlei closed 3 months ago

msdemlei commented 3 months ago

Consider this code:

import pyvo

ACCESS_URL = "http://dc.g-vo.org/maidanak/res/rawframes/siap/siap.xml?"

svc = pyvo.sia.SIAService(ACCESS_URL)
images = svc.search((340.1,3.36), size=(0.1, 0.1))
images[0].cachedataset()

At this moment, this will result in a file Maidanak_Q2237+0305_2004-08-24_00:00:00_Johnson_R-1.None (or somesuch) on disk; the extension .None is really suboptimal.

The reason extension inference fails is that dal.mimetype.mime2extension, probably for reasons lost in time, turns the media type to a bytestring before passing it on to the built-in module mimetypes. Passing in a string fixes that and yields a .fits (on systems with suitable mime.types), as expected.

andamian commented 3 months ago

It could also check for Content-Disposition if any of the services return it which in this case it does: Content-Disposition: attachment; filename="red_nh240090.fits.gz". That would avoid the guesswork.

msdemlei commented 3 months ago

On Thu, Jun 13, 2024 at 04:15:21PM -0700, Adrian wrote:

I could also check for Content-Disposition if any of the services return it which in this case it does: Content-Disposition: attachment; filename="red_nh240090.fits.gz". That would avoid the guesswork.

I'm not against that, though the current behaviour of using the image title actually yields "better" results in this case. Anyway, I'd say that's a different (and potentially mildly breaking) change that should not be mixed up with the current bug fix.