bcgov / entity

ServiceBC Registry Team working on Legal Entities
Apache License 2.0
23 stars 57 forks source link

Report API - Handling large PPR documents #13860

Closed RFK250 closed 1 year ago

RFK250 commented 1 year ago

AC

TBD

Description

The new Report API, while greatly improving performance in general, still has substandard performance when generating reports over a certain number of records. Those will still lead to timeouts, ops tickets and manual file retrievals

Options (to be decided before advancing ticket):

  1. Block reports over 700 records in size + user feedback to advise users to refine their search - dev + design time not yet quantified
  2. Searches over 700 records generate light format PDFs (see below) + user feedback to advise users of why they're seeing this format and what features are missing from the PDF (colour, page numbers, etc) - need to validate from policy perspective (will this be acceptable?) and needs UXA, would still result in some timeouts - ready to go, no dev time needed
  3. Searches over 700 records in size generate concatenated subreports + user feedback to advise of why they're seeing a different report - would still be a full-featured PDF but would take a long time to generate, but should prevent timeouts - needs one day dev time

Notes from original ticket

Large search result report notes: Total # of search requests in PROD: 690920 Total # over 700: 110 Total # over 1000: 48 Some of these are duplicates for failed attempts: New large search report light format for reports containing over 700 registrations.

No colour (similar to mail out) No watermark image. No footer image. No TOC page numbers Reduced header image quality. No page breaks between registrations No page breaks before registration history Performance: 843 registrations: time to generate 10:48, files size 19MB. 1333 registrations: time to generate 14:41, file size 32MB.

Note: removing page numbers and links from the TOC reduces the above time to generate significantly.

    Large search format example:

https://app.zenhub.com/files/157936592/fa5488b6-a259-4baf-90c2-0164fa1ee45c/download

Originally posted by @doug-lovett in https://github.com/bcgov/entity/issues/13161#issuecomment-1273748906

Option 3 concatenated sub reports performance sub reports size 500 registrations: 843 registrations: time to generate 5:46 file size 27.4MB, 1125 pages. 1333 registrations: time to generate 11:14, file size 54.3 MB, 2182 pages.

@forgeuxGH5 @tlebedovich Option 3 large example file can be downloaded from here: https://drive.google.com/file/d/1M_pEjAaPOO0gaHHp1OXHBCRAce9dyXXV/view?usp=sharing

@forgeuxGH5 @tlebedovich Attaching a sample cover page for the large search as a starting point.https://app.zenhub.com/files/157936592/d3bde4da-b065-41a8-a2ae-e7ba65bb295b/download

@forgeuxGH5 @tlebedovich 2022-10-19 TELUS example file link above refreshed with Scott's wireframe changes.

RFK250 commented 1 year ago

@doug-lovett @tlebedovich @forgeuxGH5 Tagging you to see this ticket. I think I need to run the options by Kevin, Linda and possibly Kaine before we can advance on this ticket. @doug-lovett We should keep the original PPR report ticket moving through the pipeline.

forgeuxGH5 commented 1 year ago

@doug-lovett just to clarify - would option 3 above mean the report would look and work just like the current report from the user's perspective (same colours, watermark, page numbers, links, etc.)? That option is more about how the report is put together. I just want to understand if that option would or wouldn't mean there would be a table of contents with different page numbering for each of the concatenated sub-reports.

doug-lovett commented 1 year ago

@forgeuxGH5 yes, option 3 could be multiple sub-reports concatenated together in the same format as the current search reports. Each sub-report would have working TOC links and page numbers for that particular sub-report. I think we could also add a summary page specifying the number of sub reports and the number of registrations and pages within each sub-report. The sub-report size would be 400-500 registrations depending on performance. I would start with 500.

RFK250 commented 1 year ago

@doug-lovett @forgeuxGH5 @tlebedovich FYI, I started writing out a message to Kevin and in doing so answered my own question: option 3 is the way to go b/c it results in an outcome that is closest to the outcome obtained by other clients requesting reports (including being better than an outcome that requires them to request a manual file retrieval).

forgeuxGH5 commented 1 year ago

@RFK250 it would be good to get Kevin's perspective though (is that still the plan?)

RFK250 commented 1 year ago

@forgeuxGH5 It would be good in the sense that it's always nice to have Kevin's input, but our use of him is limited and so I have to be mindful of what I choose to ask him. I don't think this is one of those occasions, though I'm open to listening if anyone thinks 3. is somehow not a better option than 1 or 2 or the status quo (i.e. manual file retrievals).

RFK250 commented 1 year ago

@doug-lovett Can you please send an example of what the concatenated report could look like? I'm thinking that @forgeuxGH5 and @tlebedovich might want to do some UX, and I will need to engage OCM since it represents a change to users' current experiences (albeit to a super small percentage of cases).

doug-lovett commented 1 year ago

@RFK250 working on the concatenated version now.

RFK250 commented 1 year ago

@doug-lovett Please add an estimate when you take this - thanks!

forgeuxGH5 commented 1 year ago

@doug-lovett I had a look at the sample outputs - I'm concerned that users could be confused/mislead if a printed report is separated and someone only reads a sub-report. So I'd recommend we do some changes to indicated the main/sub-report status. I'll mock these up but wanted to let you know where we were at - maybe see what you think about the feasibility/effort for these:

Sub-reports (reference name TBD) Sub-reports need to have an indication that it is a sub-report, e.g., "Sub-report 1/2" Should have text explaining limit of 500 registrations per report Need to explain that matches and exact matches are for the given report - not the total number found Change title to Sub-report TOC (wording TBD) Add Sub-report x/y to footer

Parent Report Show Exact Matches per report Show start and end date of registrations in report?

UI (new ticket needed) Add message to UI that reports over 500 registrations will display multiple sub-reports

doug-lovett commented 1 year ago

@forgeuxGH5 the Sub-reports changes are feasible. For the parent report, I am not sure about the start and end dates of registrations. Do you mean the date of the first registration in a sub-report, and the date of the last registration within a sub-report? Or the date range of all registrations (search results across all sub-reports)?

And the threshold to switch to this format is 700 registrations, not 500.

forgeuxGH5 commented 1 year ago

@doug-lovett

Do you mean the date of the first registration in a sub-report, and the date of the last registration within a sub-report? Or the date range of all registrations (search results across all sub-reports)?

The former - date of first registration and date of last registration in a sub-report.

@forgeuxGH5 That's also doable.

forgeuxGH5 commented 1 year ago

@doug-lovett I created a design ticket as a blocker above - right now it's just wireframes but @tlebedovich will talk about getting visual design comps in there for you

RFK250 commented 1 year ago

@forgeuxGH5 This would be a good one to present at PO-Design Sync tomorrow I think.

doug-lovett commented 1 year ago

Final example of cover page from UX design attached below.

https://app.zenhub.com/files/157936592/be056644-a94a-4af8-b62d-ac0d21d1da17/download

doug-lovett commented 1 year ago

Final example of cover page from UX design attached below.

https://app.zenhub.com/files/157936592/be056644-a94a-4af8-b62d-ac0d21d1da17/download

forgeuxGH5 commented 1 year ago

@doug-lovett UXA comments on the cover page - @tlebedovich will need to provide some additional info Monday

The alignment and size of the BC Registry Service logo in the footer are off but if this gets us more room in the footer for text I'm OK with it. We need to make sure it is consistent with the sub-report template though.

doug-lovett commented 1 year ago

@forgeuxGH5 here are my comments

The footer logo is the same across all PPR and MHR reports.

https://app.zenhub.com/files/157936592/ba76c3bd-82ce-40fb-bbb4-5d1461e401f5/download

forgeuxGH5 commented 1 year ago

@doug-lovett I'll update the checkbox list - one question / comment about the threshold/size limit, and maybe we need an offiline discussion if the answer is complicated. The comment: Having the threshold and size limit set a different numbers will confuse users (it definitely confused me) Questions: what is the advantage to having them set differently? what is the impact if we set them at the same value? From a user experience perspective I don't think this is super critical but this report is already going to be unexpected and potentially confusing so I'm just looking to minimise potential confusion anywhere I can.

doug-lovett commented 1 year ago

@forgeuxGH5 the impact of setting the threshold to 700 is a handful of reports per month. 700 is around the point where the amount of time for a report to generate starts to significantly degrade. For a report of 800, subreports of 500 and 300 is significantly faster than 700 and 100. Setting the threshold to match the subreport size of 500 will impact more users - I chose 700 as a compromise.

RFK250 commented 1 year ago

I think 700 is a good threshold to start with - there's nothing saying we can't tweak it later if it becomes prudent. From a numbers perspective, reports over 700 represent 0.02% of all reports generated, i.e. 99.98% of the reports will be generated normally.

I agree @forgeuxGH5 that we should spend some time to make it clear to the 0.02% that their report is going to be different, but given the infrequency maybe this is one case where a modal in the UI (e.g. before generating the report) would be warranted and would be enough to alert people that something will be different than expected - maybe even give them the option to go back and refine their search results.

forgeuxGH5 commented 1 year ago

@doug-lovett @RFK250 OK - let me have a look at it really quick - I think I can word it to reduce potential confusion and keep the threshold and sub-report sizes as they are set currently (700/500).

forgeuxGH5 commented 1 year ago

@doug-lovett can you update the text in the yellow alert box to "This search result report contains a large number of registrations and has been divided into multiple sub-reports. "

image
forgeuxGH5 commented 1 year ago

@RFK250 we have an existing modal in the UI for large search results - we could add some info there about the report structure - we'd need a ticket to do that.

image
forgeuxGH5 commented 1 year ago

@doug-lovett @RFK250 I think with the text change above this ticket could move forward and we could handle any of the visual design things in a separate ticket to be done as enhancements.

RFK250 commented 1 year ago

Thanks @forgeuxGH5 I concur - let's move this one forward and I'll make a ticket to update the large search modal.

chdivyareddy commented 1 year ago

Example of Large search result: 'MR Motors LP' which took around 9 minutes to generate 'Total Registrations in Report: 927' in DEV:

image.png

image.png

RFK250 commented 1 year ago

Absolutely fantastic!!! I love the yellow text box. Great work @doug-lovett and @forgeuxGH5 solutioning this so thoroughly. Thanks @chdivyareddy for testing and sharing the output. :)

doug-lovett commented 1 year ago

@DivyaC @forgeuxGH5 @RFK250 we have hit a GCP limit searching with TELUS file size > 53MB. There is a GCP maximum response size limit of 32MB on container requests between the ppr api and the report api. We are hitting this limit when merging/concatenating the reports. We need to find another solution. The stripped down version of reports I initially presented creates a file size of 27MB with the Telus search.. I will chat with Thor about coming up with some more options.

RFK250 commented 1 year ago

@doug-lovett Oh no! That's unfortunate. Okay, thanks for the update. I trust that the "light" version you mocked up was also potentially larger than that file size? Edit: I see it now in the description... So yes, potentially could be 32MB or greater.

doug-lovett commented 1 year ago

@RFK250 the light version was close to 32MB, so it would have to be stripped down some more to stay under the limit: no logo or a smaller one and minimize the number of pages.

RFK250 commented 1 year ago

@doug-lovett Right. Okay, definitely talk to Thor and see what other options there are, if there even are other options. Thanks Doug.

doug-lovett commented 1 year ago

@chdivyareddy @RFK250 large file > 32MB generation and download issues are resolved. A 50+ MB file for TELUS can be downloaded now in dev from account BCREG0057.

UXA can review outputs for the above account from the UI.

RFK250 commented 1 year ago

@doug-lovett Amazing!! How did you do it?

chdivyareddy commented 1 year ago

@doug-lovett , I'm able to download the TELUS report from DEV, thank you!!

image.png

tlebedovich commented 1 year ago

@doug-lovett Visual Design UXA for large reports:

Screen Shot 2022-10-26 at 2.38.57 PM.png


Screen Shot 2022-10-26 at 2.37.25 PM.png


Screen Shot 2022-10-26 at 2.29.16 PM.png

doug-lovett commented 1 year ago

@tlebedovich unfortunately when merging pdfs the internal links no longer work both with the report api and with a python pdf library. For item 2 keep in mind prod does not have TEST DATA |. I increased the truncation cutoff to the maximum of 75 for a sub report after adjusting the footer margins and table cells. A sample partial report with the format changes is provided below.

https://app.zenhub.com/files/157936592/d42a7978-ce63-40d3-b0a8-9f7cbbcffed7/download

tlebedovich commented 1 year ago

@chdivyareddy - Moving to RFQA. There is one leftover item where the MHR and page number links don't/can't work (see above), can you please log a UX bug to remove the blue colour and underlines for the sub report table of contents.

chdivyareddy commented 1 year ago

@tlebedovich Created #14044 to remove hyperlinks, thanks!

chdivyareddy commented 1 year ago

Verified!

image.png

image.png