Feature Request: Remove startsAt and endsAt filter restrictions

schneefux commented 7 years ago

What does that mean? Should I query 4 * ~30 days if I want all data? Or do you keep 4 full month shards? Or do you keep only 3 and a half if the current month isn't over yet? Will you change the behavior one day?

How do I implement the logic in my client? How do I keep my code current?

In combination with #249, #248, #271 and #236, my code is turning into a mess. It should be KISS. For reference, here are the relevant parts of my code:

  0     // API requires maximum time interval of 4w                                         
  1     // return jobs [payload, payload, …] with fixed createdAts                                
  2     getSplitGrabs(payload, start, end) {                                                
  3         let last_start = new Date(start.getTime()),  // last match in database or static default from environment variable
  4             payloads = [];                                                              
  5         // loop forwards, adding 1 month until we passed NOW                            
  6         while (last_start < end) {                                                      
  7             /* API does not always support full-month intervals                                                                         
  8             let this_end = new Date(last_start.getTime() - 1000);                       
  9             this_end.setMonth(this_end.getMonth() + 1);                                 
 10             */                                                                          
 11             let this_end = new Date(last_start.getTime() + (4 * 60*60*24*7*1000));      
 12             let this_start = new Date(last_start.getTime());                            
 13             // query from `start` to `start + 1 month - 1s`, hitting the full month     
 14                                                                                         
 15             if(this_end > end) this_end = end;  // API errors if createdAt-end is later than now()                                        
 16             let pl = JSON.parse(JSON.stringify(payload));                               
 17             pl["params"]["filter[createdAt-start]"] = this_start.toISOString();         
 18             pl["params"]["filter[createdAt-end]"] = this_end.toISOString();             
 19             payloads.push(pl);                                                          
 20             // add +1s so we do not overlap start&end and are in the next month         
 21             last_start = new Date(this_end.getTime() + 1000);                           
 22         }                                                                               
 23         return payloads;                                                                
 24     }

payloads is forwarded to a microservice that uses the payload to call /matches:

  0 // loop over all pages, yield jsonapi parsed data results                               
  1 // TODO make it an async generator? coroutine?                                          
  2 module.exports.requests = async (endpoint, region, options, logger) => {                
  3     let url = api.url(endpoint, region),                                                
  4         result = [];                                                                    
  5     do {                                                                                
  6         const [data, links] = await api.request(url, options, logger);                  
  7         if (data == undefined) break;  // 404                                           
  8         if (links == undefined) break;  // 404                                          
  9         url = links.next;                                                               
 10         result.push(...data);                                                           
 11         if (links.next == undefined) break;  // last page                               
 12     } while (true);                                                                     
 13     return result;                                                                      
 14 }

api.request:

  0 // send a request to url with options, log data about it                                
  1 // return [jsonapi parsed data, jsonapi links]                                          
  2 module.exports.request = async (url, options, logger) => {                              
  3     let response;                                                                       
  4     while (true) {                                                                      
  5         try {                                                                           
  6             const opts = {                                                              
  7                 uri: url,                                                               
  8                 headers: {                                                              
  9                     "X-Title-Id": "semc-vainglory",                                     
 10                     "Authorization": MADGLORY_TOKEN                                     
 11                 },                                                                      
 12                 qs: options,                                                            
 13                 json: true,                                                             
 14                 gzip: true,                                                             
 15                 time: true,                                                             
 16                 forever: true,                                                          
 17                 timeout: API_TIMEOUT*1000,                                              
 18                 strictSSL: true,                                                        
 19                 resolveWithFullResponse: true                                           
 20             };                                                                          
 21             logger.info("API request", {                                                
 22                 uri: opts.uri,                                                          
 23                 qs: opts.qs                                                             
 24             });                                                                         
 25             response = await request(opts);                                             
 26             return [jsonapi.parse(response.body), response.body.links];                 
 27         } catch (err) {                                                                 
 28             response = err.response;                                                    
 29             if (err.statusCode == 429) {                                                
 30                 logger.warn("rate limited, sleeping");                                  
 31                 await sleep(100);  // no return, no break => retry                      
 32                 continue;                                                               
 33             } else if (err.statusCode != 404) {                                         
 34                 logger.error("API error", {                                             
 35                     uri: url,                                                           
 36                     qs: options,                                                        
 37                     status: err.statusCode,                                             
 38                     error: response? response.body : err                                
 39                 });                                                                     
 40                 fs.appendFileSync(ERROR_LOG, JSON.stringify(err) + "\n");               
 41                 throw err;  // rethrow for services to catch and handle                 
 42             }                                                                           
 43             logger.warn("not found", {                                                  
 44                 uri: url,                                                               
 45                 qs: options,
 46                 status: err.statusCode                                                  
 47             });                                                                         
 48             return [undefined, undefined];                                              
 49         } finally {                                                                     
 50             if (response != undefined) {  // else non-requests error                    
 51                 logger.info("API response", {                                           
 52                     uri: url,                                                           
 53                     qs: options,                                                        
 54                     status: response.statusCode,                                        
 55                     connection_start: response.timings.connect,                         
 56                     connection_first: response.timings.response,                        
 57                     connection_end: response.timings.end,                               
 58                     ratelimit_remaining: parseInt(response.headers["x-ratelimit-remainin
    g"])                                                                                    
 59                 });                                                                     
 60             }                                                                           
 61         }                                                                               
 62     }                                                                                   
 63 }

That is 100 lines just to get a few matches from the API. (Granted, a lot of it is logging, comments and abstraction.)

Can you consider paginating the responses server side using the links attribute? That gives you control over the number of shards a single request hits and you can prevent clients provoking 500s and wasting requests with broken pagination logic. Instead of erroring if a request exceeds the 28 day limit, you can paginate it, which removes another quirk of the API.

svperfecta commented 7 years ago

Hey @schneefux You're confusing filter parameters with paging. It's two different things.

We already support and currently provide pagination links in all API responses. All good!

With that said, we consider any request with longer a 28 day window invalid because the range is to large. This isn't weird, or unusual, its fairly common to provide both defaults an limitations on query parameters.

I think where things are going wrong for you, is that you're trying to pull all data for a player ever. If that's the case, as you noted, you'll need a loop within a loop.

I don't think this is really that much code, only a few lines of it has anything to do with pagination. The rest of it is formatting the request, and dealing with errors.

svperfecta commented 7 years ago

PS: If it was me... I'd do this differently. I'd keep an array or queue or something of requests, and just push entries onto it. First I'd push one query for each month you want to search. Based on the response, if a pagination link was provided, I'd push that onto the queue too. Really small amount of code. I think part of the problem is the way you're broken up the various services.

schneefux commented 7 years ago

What else is the use case of the API if it's not getting all the data for a given set of filters in a specific time period? Dates shouldn't be a filter on your end, the entire pagination logic should be based on time series - that is how you handle the data retention, that is how you shard your data, that is what clients are interested in, etc. pp. In my opinion, the "pagination pagination", i. e. page[limit] and page[offset] should be replaced by a dynamic time series cursor. So that your query does not look like this (pseudo-SQL):

SELECT * FROM matches_eu WHERE createdAt BETWEEN filterCreatedAt-start AND filterCreatedAt-end LIMIT pageLimit OFFSET pageOffset

but is instead

SELECT * FROM matches_eu WHERE createdAt BETWEEN createdAt-start AND createdAt-end LIMIT 50

with createdAt-start and createdAt-end being a moving time window between filter[createdAt-start] and filter[createdAt-end], passed as state to the client in the links object.

You don't risk needing large offsets, which is better for performance and removes client side restrictions. (I'm making assumptions about your database based on your past statements here and in the Discord chats.)

SuperEvilMegacorp / vainglory-assets

Feature Request: Remove startsAt and endsAt filter restrictions #288