VertNet / webapp

VertNet web application
8 stars 7 forks source link

Truncated download results #645

Open tucotuco opened 6 years ago

tucotuco commented 6 years ago

From Scott Chamberlain...

"The user is using rvertnet::bigsearch - the interface to your download service. He was getting only 33K records (exactly that many, which makes it sound especially like a hard limit) for a search on class="Aves", while they are getting 210K records for class="Aves" + inst="UMMZ" . It seems that the first query should surely be a larger set of data than the second. So we're wondering if there's some kind of limit that is sometimes imposed, sometimes not. Because if it was always imposed, he would only get 33K for both of those queries."

tucotuco commented 6 years ago

There is a hard limit, but it is based on a Google Cloud storage concatenation limit, which is 1024 files. We make files of 1000 records each and join them to make the final download file, so the limit the way we are doing things is 1024000 records. Our reasoning is that, for anything bigger, people should be using the snapshots to avoid excessive costs to us. We'd have to look back through the logs and Google Cloud Storage to see if we can figure out why the Aves query (which WOULD fail to give all desired records) fails with 33k records.