ChristopherWilks / snaptron

fast webservices based query tool for large sets of genomic features
Other
25 stars 7 forks source link

Truncation of results in the presence of very large queries/high number of concurrent queries #11

Open ChristopherWilks opened 5 years ago

ChristopherWilks commented 5 years ago

Seen this on both Stingray and on the cluster mirror.

Typically only happens when there are queries which return large numbers of rows (10000's) or the rows themselves are extremely large such as coverage queries with 10000's of samples in each row, or there is a high amount of concurrency (100 queries all started at the same time).

A form of the error will be reported by either Python or curl (18) as: transfer closed with outstanding read data remaining

However, it's unclear whether the server is failing to transfer the full payload or the client is failing to keep up, or both.

The server doesn't always report an error. OTOH it's not clear why the client couldn't keep up when running on a server with large numbers of cores/memory, though there are situations where raising the read buffer on the client has alleviated the problem in specific cases (but doesn't always work).

ChristopherWilks commented 5 years ago

This most recent came up with the snaptron_query function in the recount Bioconductor package. In that case, the queries themselves weren't resulting in many results (only 185 rows from 100 queries). But the function was calling RCurl with async=TRUE which means it was trying to concurrently query all 100 urls. Switching the function use async=FALSE appeared to fix the issue.

lcolladotor commented 5 years ago

I added the async argument to snaptron_query() to recount in both bioc release (3.8, v1.8.2) and devel (3.9, v1.9.2) to address this issue. I left async = TRUE by default, but well, users can turn it off if they encounter this problem later on.

Thanks for looking into this Chris!

(Cc'ing @emilyburke who found the issue to begin with)

ChristopherWilks commented 5 years ago

thanks @lcolladotor!