glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Strategy for server side pagination #245

Closed ReneRanzinger closed 1 year ago

ReneRanzinger commented 1 year ago

How many web services do we need? How is this linked to the details service in the beginning? If omitted how to notice that there is no data at all?

Blocker for:

ReneRanzinger commented 1 year ago

Proposed strategy The details pages do not include data from server side paginated sections. Instead the details page has an "additional_data" property that is an array similar to the "tool_support" array on glycan details. It lists all the server side paginated sections with the number of records. Frontend loads the details JSON and based on the array triggers webservice calls to load the first "page" for each server side paginated data table. If the number in the "additional_data" array for a data type is 0 it will not trigger a webservice call.

There could be (a) one webservice for each server side paginated data type or (b) one general webservice that takes additional parameters. Although more work and more webservices, my preference is (a). Its easier to document (for external users) and the JSON schema for the response JSON can be specific for the datatype. Which will allow better (automated) error checking on the response. It will also make the sorting less complex since you can only sort by properties present in the current table.

Assuming (a): All webservices need the ID for the details page (GlyTouCan ID, UniprotAcc) the offset, limit, sort by and sort direction. The response should include the position metric (total number, offset, limit) and the data records.

Open questions

@rykahsay and @sujeetvkulkarni please review and lets finalize this tomorrow.

sujeetvkulkarni commented 1 year ago

Glycosylation card pagination - Need to manage pagination for each tab by server. Card level summary - Glycosylation Summary: 5 site(s) total, 55 N-linked annotation(s) at 4 site(s), 3 O-linked annotation(s) at 1 site(s) Tab level summary - Summary: 5 site(s) total, 14 N-linked glycan(s) at 4 site(s), 2 O-linked glycan(s) at 1 site(s)

Both card and tab level summary information needs to come from server.

In my opinion initial data (eg. max 20 entries per table)for all the paginated tables should come from details api, and further webservice call should only be triggered in case user clicks on different pages. This would reduce api calls and page load time.

ReneRanzinger commented 1 year ago

Conclusion of the developer meeting on 4/12/2023

Details web service changes When requesting the JSON from a details web service frontend will send a list of data properties that should be server side paginated (@sujeetvkulkarni will provide an example how this looks like). The details JSON response has an "section_stats" property that is an array similar to the "tool_support" array on glycan details. It lists all the server side paginated sections with the number of records. Some section such as the glycosylation section will require a nested object to account for the total summary

Glycosylation Summary: 5 site(s) total, 55 N-linked annotation(s) at 4 site(s), 3 O-linked annotation(s) at 1 site(s)

and the summary shown on each of the tabs Summary: 5 site(s) total, 14 N-linked glycan(s) at 4 site(s), 2 O-linked glycan(s) at 1 site(s)

@sujeetvkulkarni provide an example how the glycosylation section of the "section_stats" should look like.

By default all tabular sections are server side paginated. And no data of these sections is in the JSON except for the "section_stats". If the API call contains a list of sections that should be paginated (wishlist) two things change:

  1. For all sections not in the list the complete list of all records is integrated in the JSON (as it is right now)
  2. The wishlist section only contain the records for the first "page" for the table. To retrieve more data the pagination webservice needs to be called.

Pagination webservice We will try to implement a general pagination webservice for all types of data. The downside will be that the JSON schema will have to be very general to cover all the different types of tabular data. This will make automated error checking of the response based on the schema very hard. Another issue is that the sorting will be more complex. Since the sorting options are dependent on the data type.

Input the webservice:

The response should include the position metric (total number, offset, limit) and the data records (similar to list pages)

ReneRanzinger commented 1 year ago

@sujeetvkulkarni

@rykahsay

sujeetvkulkarni commented 1 year ago

@ReneRanzinger @rykahsay Please review below object for glycosylation summary Provide an example how the "section_stats" should look like for the glycosylation data.

"section_stats" :{
"phosphorylation": 1500,
"mutagenesis" : 200,
"glycosylation_reported_with_glycans": 100,
"glycosylation_reported": 55
...
    "glycosylation_summary": {
        "total_sites": 5,
        "n_linked_annotaions": 14,
        "n_linked_annotaion_sites": 14,
        "o_linked_annotaions": 14,
        "o_linked_annotaion_sites": 14,
        "reported_with_glycans": {
            "total_sites": 5,
            "n_linked_glycans": 14,
            "n_linked_glycan_sites": 14,
            "o_linked_glycans": 14,
            "o_linked_glycan_sites": 14
        },
        "reported": {
            "total_sites": 5,
            "n_linked_annotaions": 14,
            "n_linked_annotaion_sites": 14,
            "o_linked_annotaions": 14,
            "o_linked_annotaion_sites": 14
        },
        "predicted": {
            "total_sites": 5,
            "n_linked_annotaions": 14,
            "n_linked_annotaion_sites": 14,
            "o_linked_annotaions": 14,
            "o_linked_annotaion_sites": 14
        },
        "text_mining": {
            "total_sites": 5,
            "n_linked_annotaions": 14,
            "n_linked_annotaion_sites": 14,
            "o_linked_annotaions": 14,
            "o_linked_annotaion_sites": 14
        }
    }
}
sujeetvkulkarni commented 1 year ago

@ReneRanzinger @rykahsay Please review below object for details api with paginated tables wish list. Provide an example how a details webservice call will look like to include the "wishlist" of paginated sections

current api call : https://api.glygen.org/protein/detail/P14210-1

Proposed example with paginated tables list,


/protein/detail?query={
    "uniprot_canonical_ac":"P14210-1",
    "offset":1,
    "limit":20,
    "order":"asc",
    "paginated_tables":[
        "glycosylation_with_glycans",
        "glycosylation_reported",
        "glycosylation_predicted",
        "glycosylation_text_mining",
        "phosphorylation",
        "publication"
    ]}
ReneRanzinger commented 1 year ago

@sujeetvkulkarni we need confirm with @rykahsay but I would prefer to keep the API call semantic (including the protein ID). The other problem is the limit and sort criteria for each of the tables can be different. I think it has to be a list of objects rather than strings.

sujeetvkulkarni commented 1 year ago

Making paginated_tables list of objects so that each table can have different offset, limit, order. We dont need an offset field as every table in details api will start from 1 and for retrieving next results a separate table specific api will be called but keeping it for consistency. Different sort keys for each table needs to be defined.

/protein/detail/P14210-1?query={
    "uniprot_canonical_ac":"P14210-1",
    "paginated_tables":[
        {
            "table_id": "glycosylation_with_glycans",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        },
        {
            "table_id": "glycosylation_reported",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        },
        {
            "table_id": "glycosylation_predicted",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        },
        {
            "table_id": "glycosylation_text_mining",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        },
        {
            "table_id": "phosphorylation",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        },
        {
            "table_id": "publication",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        }
    ]}
rykahsay commented 1 year ago

@sujeetvkulkarni @ReneRanzinger ...

Can we change the "section_stats" as follows for to give a consistent structure (and less fields in the schema)?

{
    "section_stats" :[
        {
            "table_id":"glycosylation_reported_with_glycans",
            "table_stats":[
                {"field": "total", "count": 100}
                ,{"field": "total_sites", "count": 5}
                ,{"field": "n_linked_glycans", "count": 14}
                ,{"field": "n_linked_glycan_sites", "count": 14}
                ,{"field": "o_linked_glycans", "count": 14}
                ,{"field": "o_linked_glycan_sites", "count": 14}
            ]
        },
        {
            "table_id":"glycosylation_reported",
            "table_stats":[
                {"field": "total", "count": 100}
                ,{"field": "total_sites", "count": 5}
                ,{"field": "n_linked_glycans", "count": 14}
                ,{"field": "n_linked_glycan_sites", "count": 14}
                ,{"field": "o_linked_glycans", "count": 14}
                ,{"field": "o_linked_glycan_sites", "count": 14}
            ]
        },
        {
            "table_id":"phosphorylation",
            "table_stats":[
                {"field": "total", "count": 100}
                ,{"field": "xx", "count": 100}
                ,{"field": "yyy", "count": 100}
            ]
        }
    ]
}
ReneRanzinger commented 1 year ago

@rykahsay that is fine with me but we also need a glycosylation summery (total for all glycosylation). Do you want to make a table_id "glycosylation" or "glycosylation_summery" for this? Please do not comment on closed tickets. Reopen the tickets otherwise it will be "hidden" by default and we may miss that there is something to do.

sujeetvkulkarni commented 1 year ago

@rykahsay it's fine, but like Rene said please include glycosylation summery (total for all glycosylation) details.