Strategy for server side pagination

ReneRanzinger commented 1 year ago

How many web services do we need? How is this linked to the details service in the beginning? If omitted how to notice that there is no data at all?

Blocker for:

246
247
521

ReneRanzinger commented 1 year ago

Proposed strategy The details pages do not include data from server side paginated sections. Instead the details page has an "additional_data" property that is an array similar to the "tool_support" array on glycan details. It lists all the server side paginated sections with the number of records. Frontend loads the details JSON and based on the array triggers webservice calls to load the first "page" for each server side paginated data table. If the number in the "additional_data" array for a data type is 0 it will not trigger a webservice call.

There could be (a) one webservice for each server side paginated data type or (b) one general webservice that takes additional parameters. Although more work and more webservices, my preference is (a). Its easier to document (for external users) and the JSON schema for the response JSON can be specific for the datatype. Which will allow better (automated) error checking on the response. It will also make the sorting less complex since you can only sort by properties present in the current table.

Assuming (a): All webservices need the ID for the details page (GlyTouCan ID, UniprotAcc) the offset, limit, sort by and sort direction. The response should include the position metric (total number, offset, limit) and the data records.

Open questions

Currently there are sections in which the frontend splits the data into different tables (tabbed sections). It will not work to just get the first 20 entries with the frontend sorts them into different tables (one table gets 15 entries, the second 5 and the third is empty although there are records). We may have to create different webservices in that case or the webservice for the tabbed data gets an additional parameter that tells backend which records to filter for.

@rykahsay and @sujeetvkulkarni please review and lets finalize this tomorrow.

sujeetvkulkarni commented 1 year ago

Glycosylation card pagination - Need to manage pagination for each tab by server. Card level summary - Glycosylation Summary: 5 site(s) total, 55 N-linked annotation(s) at 4 site(s), 3 O-linked annotation(s) at 1 site(s) Tab level summary - Summary: 5 site(s) total, 14 N-linked glycan(s) at 4 site(s), 2 O-linked glycan(s) at 1 site(s)

Both card and tab level summary information needs to come from server.

In my opinion initial data (eg. max 20 entries per table)for all the paginated tables should come from details api, and further webservice call should only be triggered in case user clicks on different pages. This would reduce api calls and page load time.

ReneRanzinger commented 1 year ago

Conclusion of the developer meeting on 4/12/2023

Details web service changes When requesting the JSON from a details web service frontend will send a list of data properties that should be server side paginated (@sujeetvkulkarni will provide an example how this looks like). The details JSON response has an "section_stats" property that is an array similar to the "tool_support" array on glycan details. It lists all the server side paginated sections with the number of records. Some section such as the glycosylation section will require a nested object to account for the total summary

Glycosylation Summary: 5 site(s) total, 55 N-linked annotation(s) at 4 site(s), 3 O-linked annotation(s) at 1 site(s)

and the summary shown on each of the tabs Summary: 5 site(s) total, 14 N-linked glycan(s) at 4 site(s), 2 O-linked glycan(s) at 1 site(s)

@sujeetvkulkarni provide an example how the glycosylation section of the "section_stats" should look like.

By default all tabular sections are server side paginated. And no data of these sections is in the JSON except for the "section_stats". If the API call contains a list of sections that should be paginated (wishlist) two things change:

For all sections not in the list the complete list of all records is integrated in the JSON (as it is right now)
The wishlist section only contain the records for the first "page" for the table. To retrieve more data the pagination webservice needs to be called.

Pagination webservice We will try to implement a general pagination webservice for all types of data. The downside will be that the JSON schema will have to be very general to cover all the different types of tabular data. This will make automated error checking of the response based on the schema very hard. Another issue is that the sorting will be more complex. Since the sorting options are dependent on the data type.

Input the webservice:

type: Type of page (e.g. "protein", "glycan", "site", "paper", "motif")
id: ID of the entity of the page (e.g. GlyTouCan ID for glycan, UniProtAcc for protein). Site and paper can be a formated key (e.g. "P07911-1/396" for site, or DOI for paper)
data: Section of data on the page to be exported. Corresponds with the section on the page (e.g. "glycosylation", "phosphorylation", "publication")
tab: (optional) give the tab to export. If not present export all data.
offset: start point for the records to deliver
limit: Number of records to deliver
sort: Sorting criteria.
sort direction: ASC or DESC

The response should include the position metric (total number, offset, limit) and the data records (similar to list pages)

ReneRanzinger commented 1 year ago

@sujeetvkulkarni

Provide an example how the "section_stats" should look like for the glycosylation data.
Provide an example how a details webservice call will look like to include the "wishlist" of paginated sections

@rykahsay

Provide an example for the JSON that is send to the pagination webservice
How to we do the agreement for the sort property? How does frontend know which property values are allowed for each of the different data types?

sujeetvkulkarni commented 1 year ago

@ReneRanzinger @rykahsay Please review below object for glycosylation summary Provide an example how the "section_stats" should look like for the glycosylation data.

"section_stats" :{
"phosphorylation": 1500,
"mutagenesis" : 200,
"glycosylation_reported_with_glycans": 100,
"glycosylation_reported": 55
...
    "glycosylation_summary": {
        "total_sites": 5,
        "n_linked_annotaions": 14,
        "n_linked_annotaion_sites": 14,
        "o_linked_annotaions": 14,
        "o_linked_annotaion_sites": 14,
        "reported_with_glycans": {
            "total_sites": 5,
            "n_linked_glycans": 14,
            "n_linked_glycan_sites": 14,
            "o_linked_glycans": 14,
            "o_linked_glycan_sites": 14
        },
        "reported": {
            "total_sites": 5,
            "n_linked_annotaions": 14,
            "n_linked_annotaion_sites": 14,
            "o_linked_annotaions": 14,
            "o_linked_annotaion_sites": 14
        },
        "predicted": {
            "total_sites": 5,
            "n_linked_annotaions": 14,
            "n_linked_annotaion_sites": 14,
            "o_linked_annotaions": 14,
            "o_linked_annotaion_sites": 14
        },
        "text_mining": {
            "total_sites": 5,
            "n_linked_annotaions": 14,
            "n_linked_annotaion_sites": 14,
            "o_linked_annotaions": 14,
            "o_linked_annotaion_sites": 14
        }
    }
}

sujeetvkulkarni commented 1 year ago

@ReneRanzinger @rykahsay Please review below object for details api with paginated tables wish list. Provide an example how a details webservice call will look like to include the "wishlist" of paginated sections

current api call : https://api.glygen.org/protein/detail/P14210-1

Proposed example with paginated tables list,


/protein/detail?query={
    "uniprot_canonical_ac":"P14210-1",
    "offset":1,
    "limit":20,
    "order":"asc",
    "paginated_tables":[
        "glycosylation_with_glycans",
        "glycosylation_reported",
        "glycosylation_predicted",
        "glycosylation_text_mining",
        "phosphorylation",
        "publication"
    ]}

ReneRanzinger commented 1 year ago

@sujeetvkulkarni we need confirm with @rykahsay but I would prefer to keep the API call semantic (including the protein ID). The other problem is the limit and sort criteria for each of the tables can be different. I think it has to be a list of objects rather than strings.

sujeetvkulkarni commented 1 year ago

Making paginated_tables list of objects so that each table can have different offset, limit, order. We dont need an offset field as every table in details api will start from 1 and for retrieving next results a separate table specific api will be called but keeping it for consistency. Different sort keys for each table needs to be defined.

/protein/detail/P14210-1?query={
    "uniprot_canonical_ac":"P14210-1",
    "paginated_tables":[
        {
            "table_id": "glycosylation_with_glycans",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        },
        {
            "table_id": "glycosylation_reported",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        },
        {
            "table_id": "glycosylation_predicted",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        },
        {
            "table_id": "glycosylation_text_mining",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        },
        {
            "table_id": "phosphorylation",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        },
        {
            "table_id": "publication",
            "offset":1,
            "limit":20,
            "sort": "key",
            "order":"asc",
        }
    ]}

rykahsay commented 1 year ago

@sujeetvkulkarni @ReneRanzinger ...

Can we change the "section_stats" as follows for to give a consistent structure (and less fields in the schema)?

{
    "section_stats" :[
        {
            "table_id":"glycosylation_reported_with_glycans",
            "table_stats":[
                {"field": "total", "count": 100}
                ,{"field": "total_sites", "count": 5}
                ,{"field": "n_linked_glycans", "count": 14}
                ,{"field": "n_linked_glycan_sites", "count": 14}
                ,{"field": "o_linked_glycans", "count": 14}
                ,{"field": "o_linked_glycan_sites", "count": 14}
            ]
        },
        {
            "table_id":"glycosylation_reported",
            "table_stats":[
                {"field": "total", "count": 100}
                ,{"field": "total_sites", "count": 5}
                ,{"field": "n_linked_glycans", "count": 14}
                ,{"field": "n_linked_glycan_sites", "count": 14}
                ,{"field": "o_linked_glycans", "count": 14}
                ,{"field": "o_linked_glycan_sites", "count": 14}
            ]
        },
        {
            "table_id":"phosphorylation",
            "table_stats":[
                {"field": "total", "count": 100}
                ,{"field": "xx", "count": 100}
                ,{"field": "yyy", "count": 100}
            ]
        }
    ]
}

ReneRanzinger commented 1 year ago

@rykahsay that is fine with me but we also need a glycosylation summery (total for all glycosylation). Do you want to make a table_id "glycosylation" or "glycosylation_summery" for this? Please do not comment on closed tickets. Reopen the tickets otherwise it will be "hidden" by default and we may miss that there is something to do.

sujeetvkulkarni commented 1 year ago

@rykahsay it's fine, but like Rene said please include glycosylation summery (total for all glycosylation) details.

glygener / glygen-issues

Strategy for server side pagination #245

246

247

521