Open retifrav opened 2 years ago
On Thu, Aug 04, 2022 at 07:01:47AM -0700, retif wrote:
You can't see it here, as the symbol is kind of invisible, but the problematic place does have this
<0xa0>
symbol inSection<0xa0>[chap:xmdr1]
, which I guess is what is causing the problem.
Yeah, non-breaking space.
If TAP specification dictates all the services to return results in
It's not TAP directly, it's VOTable that still says, in effect, "char is ASCII only". What you see here is (I say that without actually having followed it) an exception re-raised from within the deep bowels of Astropy's VOTable parser.
I give you the message could be a bit more graceful ("The operators have stuck non-ASCII into a char field. Scold them") -- or we could do client side what DaCHS does server-side in such cases: replace all non-ASCII with question marks. Perhaps we should raise a bug against Astropy regarding that? I don't think a patch to that effect would be hard to do, except that we have >= 3 serialisations that behaviours of which would have to be kept in sync.
On the VO side, we could also finally decree that VOTable char should allow UTF-8, which has been brought up in the IVOA now and then (cf. http://mail.ivoa.net/pipermail/apps/2014-August/000968.html ff). I'd argue in favour of it again if someone brought it to the apps list. After so many years, I reckon the opponents may have reconsidered.
Be that as it may: pyVO can do nothing about it. ESAC needs to fix their service, either making sure there's only ASCII in their descriptions or perhaps declaring their description column unicodeChar(). I think that would be ok by TAP, which says description must have the data type "string"; to explain that, it says "implementers may choose an appropriate data type that behaves the same way in queries and output (e.g. varchar(16) or varchar(64) for string...)". I'd read that as including unicodeChar(), and I'd be surprised if anything in pyVO had a problem with that.
Who will take that to ESAC?
@jespinosaar - pinging you as this non-ascii character in the response belongs upstream either to ESAC or, if you need it for the server side, to push the change in the standard through IVOA.
Hi @bsipocz , we will check what is happening, thanks for reporting!
Thank you @retifrav for this information. If you open the json file (the one downloaded by curl) you can show the offending character, if you open it with vi and set the following properties
:set listchars=nbsp:×,tab:\ \ ,trail:\ , :set list
We have checked that the following catalogues would generate the same error:
OS: Mac OS 12.5 Python: 3.9.13 PyVO: 1.3
PyVO raises an exception with the following query:
The exception:
If I query the same thing with a bare cURL:
then the result is the following:
You can't see it here, as the symbol is kind of invisible, but the problematic place does have this
<0xa0>
symbol inSection<0xa0>[chap:xmdr1]
, which I guess is what is causing the problem.If TAP specification dictates all the services to return results in ASCII only, then I'd say it's certainly the Gaia service fault and not PyVO, but even then I'd say it would be useful to be able to specify the encoding for reading the results (I'm assuming that the same string would read fine with UTF-8), as right now it seems to be "hardcoded" to ASCII.