esi / esi-issues

Issue tracking and feature requests for ESI
https://esi.evetech.net/
209 stars 23 forks source link

Of identifying the array responses' items #983

Open guiguilechat opened 6 years ago

guiguilechat commented 6 years ago

I want to make a meta cache compiler (in java) for the ESI. The first part is some explanation because I think context is required to understand my questions.

Code part I did


So far I have the meta cache compiler for paths that return a single element ( https://github.com/guiguilechat/EveOnline/blob/master/model/ESI-compiler/src/main/java/fr/guiguilechat/eveonline/model/esi/Compiler.java find createCache l736 )

It's very simple : when requested for the resource get_path?q=Y, if you already have the holder for this path, typically through a map of <Y, holder>, you return this holder ; else you create the self-scheduling future that will fetch the resource and store it in a holder, put it in the map, and return that holder.

Simple, nice, there are a few bugs but the idea is good I think. It means that this method both handle the cache AND the "callback" function required to have heavy parallel requests

Now what can I do for it when the response is an array of T ? well when the response is an item, I know that when this item changed, well THIS item changed. When I ask for The Forge market orders, basically every 5 minute I will receive a different array of 3000 orders.

What I would like

Still some code example, sorry.

In most cases the array of items returned have a unique attribute. eg each marketorder has a"order_id" attribute that is unique, meaning I can make a map<id, order> as the response handler instead of an array of orders in a holder.

ATM the "unique" is only referenced in the description. I need to iterate over the fields, and find out if one of them has the "Unique" or "unique" keyword in its description. If the number of fields with "unique" is not 1, I fail the cache creation (because returning a holder of marketOrder[] is completely useless wrt performances). If it's 1, then I just make a cache of map<id, order>, and when a modification in the order list is fetched, for each order I check if the map already have this order id stored, if true I compare the new order to the stored and if they differ I replace it. Which mean, only the orders that differ from the cached version are actually changed in the cache.

This leads to my question:

Question

  1. Is the uniqueness standardized somewhere ? I searched a little in the swagger spec but could not find it (but I aint expert). I think relying on the description is a bit weird. So
  2. Is the Unique keyword in the description enforced by CCP ? To me it looks like, and if there is no uniqueness standard in swagger this seems the best option, but I need to know the exact mechanism and constraints ; eg can I expect the "unique" keyword to only appear in at most one field ?
deadlybulb commented 6 years ago

The OpenAPI Schema spec doesn't provide an explicit provision for tagging unique fields (although you can tag an array as having unique items). It is possible to extend the spec similar to HTTP header conventions, e.g. x-your-custom-tag. So CCP would have to choose to do that.

I have a similar problem for EveKIt. I've discovered unique fields for all the ESI data I store through hard fought trial/error and asking questions. Depending on how you "flatten" your data, uniqueness only comes in some cases by combining multiple fields into a composite key. I'm happy to document my list of unique keys somewhere if that's helpful.

EricE commented 6 years ago

Please document your findings!

guiguilechat commented 6 years ago

@deadlybulb thank you very much, that covers my question with sourced informations.

@EricE what do you mean ?

EricE commented 6 years ago

@guiguilechat sorry I wasn't clearer. Comment was for @deadlybulb in response to the last sentence of his post, "I'm happy to document my list of unique keys somewhere if that's helpful."

deadlybulb commented 6 years ago

I'll submit a PR in a few days and we can figure out where to put it.

guiguilechat commented 6 years ago

I think there is an issue with this notion

on https://esi.evetech.net/ui/#/Bookmarks/get_characters_character_id_bookmarks

I think the bookmark_id should be unique in the response 200 value. ATM description is "bookmark_id integer", if I'm right it should be "Unique bookmark_id integer"

I know this is a weird request but my cache compiler translates it as non-unique field, thus does not handle the modification of one BM (instead it considers all BMs are removed and new BMs are added)

The same issue applies to https://esi.evetech.net/ui/#/Bookmarks/get_corporations_corporation_id_bookmarks

also the folder_id of https://esi.evetech.net/ui/#/Bookmarks/get_characters_character_id_bookmarks_folders and https://esi.evetech.net/ui/#/Bookmarks/get_corporations_corporation_id_bookmarks_folders

Aidansavage commented 6 years ago

I know this is a weird request but my cache compiler translates it as non-unique field, thus does not handle the modification of one BM (instead it considers all BMs are removed and new BMs are added)

Isnt that something to be fixed by your compiler, whether upstream or by telling it which fields to treat as unique?

guiguilechat commented 6 years ago

The compiler is supposed to understand from the swagger which fields are unique. Specifying this information by hand defeats the goal of the swagger.

Typically my compiler iterates over the response's fields, to find out how many of them have a description that starts with "u|Unique ".

The list of fields detected as "unique" is used to know whether the cache should return a list (no or more than 1 unique), or a map(one unique).

If I produce a list, I then create the cache container as an observablelist, which I put the results into when retrieving them from the ESI. If I produce a map, I transform the array of results from the ESI into a temporary map with the key being the unique field, and the values being the ESI array's items, which I then merge into the returned map.

That's why I asked the Q2 in my OP : "Is the Unique keyword in the description enforced by CCP ?"

guiguilechat commented 6 years ago

I think the best and easiest way to solve this, would be to use dict instead of array type.

I think swagger allows to return dict type, though only with a key type of string : https://swagger.io/docs/specification/data-models/dictionaries/

In the case of get_characters_character_id_bookmarks, instead of having an array of bms returned, we would have a object, with additional property type is bm. The key being the id of the bm in plain text.

Another example is for TheForge orders : instead of returning a plain list of unordered orders, this would return a dictionnary of orderid -> order

guiguilechat commented 6 years ago

The response for get_characters_character_id_bookmarks_ok is

          "description": "200 ok array",
          "items" : TYPEDESCRIPTION,
          "maxItems": 1000,
          "title": "get_characters_character_id_bookmarks_ok",
          "type": "array"

, with TYPEDESCRIPTION being the descriptiopn of the structure with title get_characters_character_id_bookmarks_200_ok

The only change required, I THINK, is to replace two lines, to get

          "description": "200 ok array",
          "additionalProperties" : TYPEDESCRIPTION,
          "maxItems": 1000,
          "title": "get_characters_character_id_bookmarks_ok",
          "type": "object"

This is the change affecting the swagger.json, of course the esi server needs to be adapted.

guiguilechat commented 6 years ago

Having a list of key-value couples would also open up the transmission of incremental data, since we can clearly identify the modifications between previous and present result.

right now if my path returns

Etag : e1
ret : 
  r1, r2, r3

And next modification returns

Etag : e2
ret : 
  r2, r3, r1

This means, data are the same ; but the array is in a different order so I can't affirm they are the same. With my modification it becomes possible to sort the returned dict by key, which means first and second return both become

Etag : e1
ret : 
  {k1:r1, k2:r2, k3:r3}

Now if we have a modification between etag 1 and etag 2 , that is small enough, we would want to return not the full dict (keep in mind we may talk about 1000 values for TheForge market), but instead the incremental modification wrt the previous etag.

example : first return is still same as previous. Then on etag 2 the only difference is, r2 has been removed .

ATM the response 2 will be eg. with very small data.

Etag : e2
ret : 
  r3, r1

While if data is put in dict, and sorted by key instead, it becomes possible to return

Etag : e2
increment : e1
ret : 
  {r2:}

Which means the increments compared to e1 is only the deletion of r2 from the result. Of course if someone transmits his request with if-non-match:e0 while e1 is previous, the whole page is returned (we only cache the increment from previous etag to present etag) :

Etag : e2
ret : 
  {k1:r1, k3:r3}

This also means we need a specific header from user to return the increment instead of the full data.

In the end, even though it may look like the size of data to transmit is increased, with incremental transmission

guiguilechat commented 5 years ago

an example of issue because of this. This is, on any language, assuming you convert the json directly into the correct type.

ATM if I want the insurance price /cost of a ship, let's say the avatar ( id 11567 ). I fetch https://esi.evetech.net/latest/insurance/prices/?datasource=tranquility&language=en-us This gives me the list of insurance possibilities. So I need to iterate over half (average) the insurances and check if the typeid is the correct one. If it was a map , with the text of the id as key, then I could just get it with one deref method (eg response["11567"]). By now the correct use of this path, to get the insurance of a ship, is to translate the list into a map and store it to avoid reparsing the whole list each time.

If you correct this path, that will save all devs that want to use it the need to do so. (on a side note, the values of the dict must have same structure as now, because the key is a string, not a number)

guiguilechat commented 5 years ago

It's not a question (edit : it was at the beginning). It's a design bug (edit : after discussion and documentation I came to this conclusion).

When an array's item have a unique attribute among the array, then this array IS a dict. It's like I ask a seller what does he sell and he tells me it's atoms. Dude I know you sell fruits, just tell me the kind of orange you sell.

What's more, this is a hinder to caching the data. Right now I have 2000+ bps. The get_blueprints gives me a LIST of 2000 BPs. So when the cache expires and I fetch the new one, I have 2000 new ones and I have no idea which one has been modified so I need to remove all of them from MY cache and then add them again here I do it in the generated code, which triggers all my removal/add hooks ; and then add hooks on this auto-generated cache to make a hand-made cache to correct the structure done here. If the swagger was specifying the map correctly I would not need to do it by hand and the generated cache would not trigger 2000 notifications when ONE bpo goes from "available" to "in use".

edit : I'm wrong, it only triggers twice for each hook on the generated cache, one from the clear() and the second from the addAll(), because observablelist handle a-whole events (one event for all removed). However it triggers 4000 times on the "correct" map : once for each bpo removed, and once again for each bpo added.