astropy / pyvo

An Astropy affiliated package providing access to remote data and services of the Virtual Observatory (VO) using Python.
https://pyvo.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
75 stars 52 forks source link

cachedataset hands through slashes in image titles to the file system #556

Closed msdemlei closed 3 months ago

msdemlei commented 3 months ago

If a SIA service returns a slash as a part of an image title, SIARecord's cachedataset will fail with an error message that will be hard to understand, such as

FileNotFoundError: [Errno 2] No such file or directory: './ROSAT_HRI_ROSAT_Soft/Medium_X-Ray_1997-11-20_09:20:30.795013.fits'

The following code reproduces this at this point:

import pyvo

ACCESS_URL = "http://dc.zah.uni-heidelberg.de/rosat/q/im/siap.xml?"

svc = pyvo.sia.SIAService(ACCESS_URL)
images = svc.search((340.1,3.36), size=(0.1, 0.1))
images[0].cachedataset()

To fix this, we should defuse certain active characters in image_title. This would be at least the slash and probably the backslash.

Given there are several sorts of conceivable attacks when you expose various file systems to random strings from the internet, one could argue we should only let through characters between 32 and 127 inclusive, but I think we can by and large trust people who put up SIAP services. If anyone argues we should not, this would be the opportunity to do something sensible with non-ascii here.