astropy / pyvo

An Astropy affiliated package providing access to remote data and services of the Virtual Observatory (VO) using Python.
https://pyvo.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
75 stars 52 forks source link

Defuse sia title characters #557

Closed msdemlei closed 3 months ago

msdemlei commented 3 months ago

This is intended as a fix for bug #556. To keep things simple, I'm basing this on the branch of PR #553, because this plays in the vicinity of its code.

As mentioned in the commit message, one might argue we should lock down the file names more strongly than that. I'd be open to that, but only if people actually feel there is a sufficient need for that.

fixes #556

bsipocz commented 3 months ago

Rebased to resolve conflicts, and also removed the commit from the other PR as it has been merged already.

bsipocz commented 3 months ago

Same as for #553, I change the milestone for the bugfix, if that comes out sooner than 1.6

msdemlei commented 3 months ago

On Thu, Jun 27, 2024 at 02:42:24PM -0700, Adrian wrote:

  1. I would replace each \ with _ to correctly keep track of the replacements. Single \ might also create issues.

This is what already happens here; '\' is the literal for a single backslash.

  1. I've noticed that suggest_dataset_basename fixes empty spaces too but this is not the case if base is present. Should we do this sanitization in one place (and potentially add more rules to it over time)?

Hm... I'm not even sure whether I think replacing whitespace is a good idea in the first place; I think even ancient FAT would support that, and as to requiring quoting in common shells... Well, if we wanted to make sure you get away without that, we'd have to do a lot more mangling.

Anyway, I think that's a separate issue from repairing cachedataset() for image titles with (back-)slashes. I give you that refactoring suggest_dataset_basename (which has fairly parallel implementations in ssap and sia on top) might be an improvement, and something that produces filenames good across many platforms in a central place sounds like a good idea, too (where we'd have to discuss whether we'd want to limit the charset to ASCII printables...). What about filing a bug?

andamian commented 3 months ago

Issue #573 - please feel free to add or change the details.