JPLMLIA / pdsc

Planetary Data System Coincidences
https://jplmlia.github.io/pdsc/
Other
3 stars 2 forks source link

Large HTTP queries are slow #9

Closed garydoranjr closed 4 years ago

garydoranjr commented 4 years ago

Kiri says:

I've noticed that my scripts using the http client for PDSC seem to take a long time to get a response from their queries (e.g., 1.5 mins). Is this expected? (I know there's a lot of data, but just curious.)

It also appears that I am getting partial/truncated results when I've run several jobs in parallel - is there a limit on the number of active queries? If so, I would expect it to block/wait until a previous one finishes (but return a complete result) or to return an error (with 0 results). Insights/guidance welcome!

garydoranjr commented 4 years ago

I looked into this more; I guess the queries are just slow when there are a lot of results to serialize from DB rows -> Python Objects -> JSON -> Python Objects. The DB query itself only takes a few seconds in the worst case.

garydoranjr commented 4 years ago

I’ve made a change to the server that might fix the truncated responses. Apparently CherryPy ignores “socket timeout” errors, so it silently drop the connection after 10 seconds. If you’re issuing large queries (or there’s a lot of network traffic from multiple connections), it might take a while to get all of the data across. I’ve increased the timeout to accommodate larger queries. Let me know if you still observe the behavior.