astropy / astroquery

Functions and classes to access online data resources. Maintainers: @keflavich and @bsipocz and @ceb8
http://astroquery.readthedocs.org/en/latest/
BSD 3-Clause "New" or "Revised" License
695 stars 396 forks source link

coordinates issue for query_cross_id/query_cross_id_async #1494

Open estelleps opened 5 years ago

estelleps commented 5 years ago

I tried to use the query_cross_id_async from the astroquery.vsa package to query the VIKING database. According to the documentation, the coordinates argument in query_cross_id_async should be "an array of one or more astropy SkyCoord objects specifying the objects to crossmatch against" (as an example, coordinates=[SkyCoord(117, -25, unit='deg')]). However, this return "TypeError: Argument cannot be parsed as a coordinate". [error coming from /astroquery/wfau/core.pyc in _args_to_payload(self, *args, **kwargs); C = commons.parse_coordinates(args[0]).transform_to(coord.ICRS))]

If indeed I used as input a SkyCoord object instead of an array (coordinates=SkyCoord(117, -25, unit='deg')), this raise an other error but later in the code "TypeError: Scalar 'SkyCoord' object has no len()".

So what should be used as coordinates input?

keflavich commented 5 years ago

I think a SkyCoord array should be SkyCoord([117], [-25], unit=u.deg) rather than [SkyCoord(...)]. Other than that, I think parse_coordinates should be giving a more useful error message here.

estelleps commented 5 years ago

Indeed, using arrays inside SkyCoord solve the above issue.

Unfortunately an other error is raise after. If using python 2.7 we get later TypeError: unicode argument expected, got 'str' from for crd in coordinates: fh.write("{0} {1}\n".format(crd.ra.deg, crd.dec.deg)).

If using python 3.6 the unicode error is not raise but we get at the end NotImplementedError: It appears we haven't implemented the file upload correctly. Help is needed.

keflavich commented 5 years ago

Oh no. This unfortunately means there's a real bug we never resolved in the WFAU code for handling crossID searches.

keflavich commented 3 years ago

An MWE reproducing the issue:

from astroquery.vsa import Vsa
from astropy import constants, units as u, table, stats, coordinates, wcs, log, coordinates as coord, convolution, modeling, time; from astropy.io import fits, ascii
c = coordinates.SkyCoord([117], [-25], unit=u.deg)
result = Vsa.query_cross_id(c, programme_id='VIKING')

This fails. If we look at the response.txt in the debugger, it looks like:

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    <title>CrossID SQL Query Results</title>

    <script type="text/javascript" src="http://wsa.roe.ac.uk/configurestyles.js">
    </script>
  </head>

  <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#000080" alink="#FF0000"  >

    <h1>CrossID SQL Query Results</h1>

<p>Parsing input .....
<p>Parsing upload file ...
<p>files.txt uploaded file size: 0 bytes, 0 rows loaded
<p>Programme:

which suggests nothing was uploaded.

However, in the debugger, fh does have contents:

ipdb> fh.seek(0)
0
ipdb> fh.read()
'117.0 -25.0\n'

Maybe this is an issue with using the StringIO approach, instead of writing a tempfile? I don't think so, but I'm not certain.

The code in question: https://github.com/astropy/astroquery/blob/master/astroquery/wfau/core.py#L726-L833

keflavich commented 3 years ago

Ah, this is the solution:

ipdb> resp = self._request("POST", url=self.CROSSID_URL, files={'file':('file.txt', txt)})
ipdb> print(resp.text)
<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    <title>CrossID SQL Query Results</title>

    <script type="text/javascript" src="http://wsa.roe.ac.uk/configurestyles.js">
    </script>
  </head>

  <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#000080" alink="#FF0000"  >

    <h1>CrossID SQL Query Results</h1>

<p>Parsing input .....
<p>Parsing upload file ...
<p>file.txt uploaded file size: 12 bytes, 1 rows loaded
<p>Programme:
MikeeRead commented 3 years ago

I've had a quick look at the code.

The service is expecting a multipart/form-data request. As far as I can see the python requests module if given the files parameter as a dictionary (?) then multipart/form-data is sent. In the code I can maybe see where this takes place.

However the "file.txt uploaded file size: 0 bytes, 0 rows loaded" suggests none of the file contents are getting sent/read. If the data was sent but in the wrong format you'd get something like "xid2.txt uploaded file size: 19 bytes, 4 rows loaded Programme: UKIDSS Large Area Survey, LAS error reading coords at line 1"

Are you able to capture the full request before it is sent as a txt file?

I can run the crossID from perl building files that I then send via wget with

wget --header=\"Content-Type: multipart/form-data; boundary=FILEUPLOAD\" --post-file $uploadFile http://wsa.roe.ac.uk:8080/wsa/CrossID -O $outFile

and the contents of $uploadFile are shown below.

Cheers Mike

--FILEUPLOAD Content-Disposition: form-data; name="fileName"; filename="upload_1.txt" Content-Type: text/plain

180.0 0.0 181.0 0.0 --FILEUPLOAD Content-Disposition: form-data; name="selectList"

ra,dec,yapermag3,j_1apermag3,hapermag3,kapermag3,sourceID --FILEUPLOAD Content-Disposition: form-data; name="emailAddress"

--FILEUPLOAD Content-Disposition: form-data; name="radius"

1.0 --FILEUPLOAD Content-Disposition: form-data; name="programmeID"

101 --FILEUPLOAD Content-Disposition: form-data; name="rows"

0 --FILEUPLOAD Content-Disposition: form-data; name="whereClause"

(priOrSec<=0 OR priOrSec=frameSetID) --FILEUPLOAD Content-Disposition: form-data; name="database"

ukidssdr10plus --FILEUPLOAD Content-Disposition: form-data; name="baseTable"

source --FILEUPLOAD Content-Disposition: form-data; name="format"

FITS --FILEUPLOAD Content-Disposition: form-data; name="nearest"

0 --FILEUPLOAD Content-Disposition: form-data; name="compress"

GZIP --FILEUPLOAD--

keflavich commented 3 years ago

OK, that's only a partial solution; the response is now:

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    <title>CrossID SQL Query Results</title>

    <script type="text/javascript" src="http://wsa.roe.ac.uk/configurestyles.js">
    </script>
  </head>

  <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#000080" alink="#FF0000"  >

    <h1>CrossID SQL Query Results</h1>

<p>Parsing input .....
<p>Parsing upload file ...
<p>file.txt uploaded file size: 12 bytes, 1 rows loaded
<p>Programme:

so the data went up, but nothing came back?

bsipocz commented 3 years ago

Yep, I get an IOException. However, when doing the wget suggested above, it yields the same initial parsing result as we see when fixing the recovered bug in the file kwarg

ipdb>  requests.get(url=url).content                                                                                    
b'<?xml version="1.0" encoding="ISO-8859-1"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 
  <head> 
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
    <title>CrossID SQL Query Results</title> 

    <script type="text/javascript" src="http://wsa.roe.ac.uk/configurestyles.js"> 
    </script> 
  </head> 

  <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#000080" alink="#FF0000"  > 

    <h1>CrossID SQL Query Results</h1> 

java.io.IOException: Posted content type isn\'t multipart/form-data'
bsipocz commented 3 years ago

The response.url is this one:

'http://horus.roe.ac.uk:8080/vdfs/CrossID?database=VVVDR4&programmeID=120&sys=J&ra=161.26477294&dec=-59.68443085&radius=0.016666666666666666&from=source&formaction=region&xSize=&ySize=&boxAlignment=RADec&emailAddress=&format=VOT&compress=NONE&rows=1&select=default&where=&disp=&baseTable=source&whereClause=%28priOrSec%3C%3D0+OR+priOrSec%3DframeSetID%29&qType=form&selectList=default&uploadFile=file.txt&nearest=1&archive=VSA'`

MikeeRead commented 3 years ago

There seem to have been a couple of successful uploads from 97.126.126.188 but no file output has been requested though that shouldn't truncate the output but I'm not seeing any errors. i.e. the requests seem to work and HTML is returned but no link to download the results as none requested.

bsipocz commented 3 years ago

ah, ok, thank for the pointer, that will hopefully get us to the fix.

keflavich commented 3 years ago

@MikeeRead are we passing the right information to trigger a download, or are you saying that we need to put in a separate request to get the subsequent download? Or is it simply that, for the coordinates we're requesting, there's nothing to download because the results are null?

Or is there really a problem upstream and the results coming back are incorrectly truncated?

Sorry, I'm not sure exactly where we are right now.

MikeeRead commented 3 years ago

My suspicion is that the complete right information is not being sent. If it was then you should see a full results page and link to the results file if that has been requested.

The service was designed to handle input provided by the web form so incomplete or possibly extra parameters are causing it to behave oddly.

Can you provide the latest list full list of parameters being passed and to what URL.

bsipocz commented 3 years ago

I have this payload dict (there has been indeed a few unused elements passed back that are don't see in the webform (view-source:http://horus.roe.ac.uk:8080/vdfs/VcrossID_form.jsp):

{'database': 'VVVDR4',
 'programmeID': 120,
 'radius': 5.0,
 'emailAddress': '',
 'format': 'VOT',
 'compress': 'GZIP',
 'rows': 30,
 'disp': '',
 'baseTable': 'source',
 'whereClause': '(priOrSec<=0 OR priOrSec=frameSetID)',
 'qType': 'form',
 'selectList': 'default',
 'uploadFile': 'file.txt',
 'nearest': 1,
 'archive': 'VSA'}

That results this URL:

 http://horus.roe.ac.uk:8080/vdfs/CrossID?database=VVVDR4&programmeID=120&radius=5.0&emailAddress=&format=VOT&compress=GZIP&rows=30&disp=&baseTable=source&whereClause=%28priOrSec%3C%3D0+OR+priOrSec%3DframeSetID%29&qType=form&selectList=default&uploadFile=file.txt&nearest=1&archive=VSA'
MikeeRead commented 3 years ago

hmm not sure what's going on when I take those params and submit them via perl/wget

$paramVal{"database"}="VVVDR4"; # database $paramVal{"radius"}="5.0"; # search radius $paramVal{"programmeID"}="120"; # 101 is LAS $paramVal{"baseTable"}="source"; # source sourceView or detection $paramVal{"selectList"}="default"; # list of attributes eg ra,dec $paramVal{"whereClause"}="(priOrSec<=0 OR priOrSec=frameSetID)"; # optional where clause eg (priOrSec<=0 OR priOrSec=frameSetID) $paramVal{"nearest"}="1"; # 0 all nearby objects or 1 nearest only $paramVal{"format"}="VOT"; # FITS, VOT or CSV $paramVal{"compress"}="GZIP"; # GZIP or NONE $paramVal{"rows"}="30"; # rows retruned in html not needed $paramVal{"emailAddress"}=""; # best not to put in an email as you'll end up with parallel queries "mar\@roe.ac.uk" $paramVal{"disp"}=""; $paramVal{"archive"}="VSA"; $paramVal{"qType"}="form"; $paramVal{"uploadFile"}="file.txt";

together with the actual coord file contents as a multipart/form-data request it works and I get the output below (I snipped out the HTML table of results but include the download link.

So the only thing I can think is that the astroquery is still not formed correctly. Is this being sent async? It should be waiting for the results.

Mike

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

CrossID SQL Query Results

CrossID SQL Query Results

Parsing input .....

Parsing upload file ...

upload_1.txt uploaded file size: 247 bytes, 10 rows loaded

radius: 5.0

Nearest object only:

Programme: VVV: VISTA Variables in the Via Lactea
From: vvvSource
Where: (priOrSec<=0 OR priOrSec=frameSetID)

Data file generating queries can take a bit longer to execute as they write to a file ALL rows returned by the query.

A web link to your generated output file will appear at the bottom of this page.

Using database VVVDR4
QUERY STARTED: Thu Jan 07 09:28:58 GMT 2021   [1 active, 340 total]

Please keep this browser window open and wait for your results to appear below...

Connected to database (Query returned 10 result rows, all rows are shown in the displayed table.)

Download Results File , your results in a gzipped VOTable ASCII file (Contains 10 rows, 1.7 KB)
Launch file in Topcat (requires Java 1.5 and Java Web St art, approx 12Mb download for Topcat application)

QUERY FINISHED: Thu Jan 07 09:29:01 GMT 2021

Click your browsers 'BACK' button to try another query...

keflavich commented 3 years ago

Ah, I think this is the problem: we aren't leaving the connection open. I'll try to figure out how to handle that...

keflavich commented 3 years ago

I'm not really sure how we're supposed to handle this. It looks like VSA is returning a "complete" HTTP response - saying it's done - before it has sent anything past the word Programme:. To re-load the site, we'd need to re-post the file, which I don't think is intended.

Our _check_page function was clearly intended to handle this problem, but it doesn't work when part of the requested data include an uploaded file, since it can only be used to get URLs, not post to them.

Is this being sent async? It should be waiting for the results.

@MikeeRead , in short, no - I don't know if this is technically possible. I don't think you can just wait on the response of a post request if it sends partial data back immediately. But I must be missing something since you were able to get this to work.

MikeeRead commented 3 years ago

Well I can only get it to work using wget/browser which seems to wait for the full response to finish/close. I can't remember why I used perl plus calling wget rather than doing it all in perl, I presume I just got it working quicker. I've used the script quite a few times to successfully crossID millions of coords looping through batches of 10,000 or so.

At some level partial data being sent back is the norm if the response contains a lot of data as it can't all be buffered.

Unfortunately there's no proper queuing system set up behind this so if users submitted stuff asynchronously routinely without at least checking and waiting for results, then the server would grind to a halt :(