Need some ideas for how to limit large selections from "scattered" data sources.

karlmsmith commented 6 years ago

Reported by @noaaroland on 11 Jan 2008 15:11 UTC When a large selection is made from the Carbon database the service collapses and no proper error message is returned. See: #336 for the solutions.

However, this is a general problem for other data sources including and especially the Tabledap service. Some ideas:

Add capability to the time widgets to limit the time range that can be selected.
- This could be limiting since a constrained full-time range selection might be ok.
- Might get around that objection by only limiting the time selectors when no constraints are active.
Use a similar technique as the database:
- If a row_limit is set, check the number of rows to be returned by doing a count and only make the final selection if the returned rows count is less than the limit.

Migrated-From: http://dunkel.pmel.noaa.gov/trac/las/ticket/350

karlmsmith commented 6 years ago

Modified by @noaaroland on 11 Jan 2008 15:13 UTC

karlmsmith commented 6 years ago

Modified by @noaaroland on 11 Jan 2008 15:13 UTC

karlmsmith commented 6 years ago

Comment by Bob.Simons on 11 Jan 2008 16:46 UTC Replying to [ticket:350 Roland.Schweitzer]:

With this type of data, it is very hard or impossible for tabledap or LAS to know ahead of time how much data (how many rows) will be in the response. And in any case, ideally, I don't think tabledap or LAS should put an artificial limit on the amount of data the user requests (unless there are other constraints like bandwidth costs). If the user requests a lot of data, tabledap and LAS should return it (perhaps taking a long time, and while still allowing requests from other users to be processed).

The problem is: after the long wait, is the request going to succeed or fail? There is no way for tabledap or LAS to know. Only the data provider knows. If the request succeeds, the long wait was a reasonable price to pay. But in the case that prompted this ticket, it is tabledap making one call to the opendap library to get data from the source that takes 16 minutes, then reports its failure. I have no solution for this (although I will look into changing the number of retries in the opendap library). The user will be disappointed. But maybe that is the best we can do.

In the case the prompted this ticket, the real failure is at the original data source. It should be able to know whether it can or can't respond to a request. If it can't, it should fail quickly. It is very hard for tabledap or LAS to step in and provide this capability, especially if the goal is to not prevent valid but large requests. But if the solution could be provided, it would be most useful if it were provided at the lowest possible level: ideally at the original data source (so all clients benefit); tabledap is next best; and LAS is worst (because only LAS benefits).

So maybe the solution is to work with each data provider to improve their service by quickly detecting requets that will ultimately fail.

And a simple but imperfect solution for LAS is to present the dataset to the user with a default constraint(s) (e.g., Tabledap's interface always initially has a constraint to get just the last week's worth of data).

karlmsmith commented 6 years ago

Comment by @noaaroland on 11 Jan 2008 17:17 UTC I like Bob's idea of a default interval. I set a ticket for the date widgets to add this. See #352.

karlmsmith commented 6 years ago

Comment by steven.c.hankin on 11 Jan 2008 17:57 UTC 1) Bob's idea that the user interface widgets should by default be pre-set to a state that will lead to a valid product request is a good one. And one that has been discussed for many other datasets as well. There were discussions in the past that the LAS XML for axis descriptions should contain elements describing the recommended initial selection for that axis.

2) As the opening of this discussion stated, there are approaches to solving this problem through the UI (just discussed) and there are potential approaches at the "database" (SQL, TableDAP or whatever) query level. In many cases (though admittedly not all) it is possible to develop a heuristic algorithm that can anticipate the approximate volume of data that will be returned. In the carbon server, for example, we know a lot about the spatial and temporal density of the data. I think we should consider an Ajax-style query through which the UI can learn if such a heuristic exists and invoke it if it does. This would allow for "intelligent" user interface behavior on a case-by-case basis. The heuristic algorithm could presumably be encoded in a Velocity template, so this functionality would be available through a standard LAS product -- little new machinery needed. Mostly some new properties available in the XML that give the product id of the heuristic calculation.

karlmsmith commented 6 years ago

Modified by @noaaroland on 6 Jan 2011 03:21 UTC

NOAA-PMEL / LAS

Need some ideas for how to limit large selections from "scattered" data sources. #356