SIS-Team-24 / 2023-SIS-Team-24

Using Natural Language Processing, Text Insights examines and provides insights to text.
1 stars 0 forks source link

Add pdf input feature #186

Closed henrygoodman closed 1 year ago

henrygoodman commented 1 year ago

Closes #136

Currently implemented:

Potential enhancements:

henrygoodman commented 1 year ago

https://github.com/SIS-Team-24/2023-SIS-Team-24/assets/79623665/35f4e140-9480-4c00-b42f-94c2756294ef

My screen capture can only capture one app at a time, but I am dragging a PDF file from Windows Explorer and dropping into the input textarea

gracebilliris commented 1 year ago

@henrygoodman Would it be possible to upload the PDF and it automatically populate the Text to be summarised textarea as part of this feature?

henrygoodman commented 1 year ago

@henrygoodman Would it be possible to upload the PDF and it automatically populate the Text to be summarised textarea as part of this feature?

How are you defining upload here? Currently it can 'upload' by drag and drop, which will then auto populate the input textarea (like I explained the video doesn't show it well since it only records the browser and not the file explorer, but if you checkout this PR you'll see what I mean).

Are you suggesting something also like a button to click, bring up a file explorer type thing?

If so, we can decide the best layout for this (i.e. wireframes for the upload workflow), just don't want to make too many assumptions about how it should work

gracebilliris commented 1 year ago

@henrygoodman Would it be possible to upload the PDF and it automatically populate the Text to be summarised textarea as part of this feature?

How are you defining upload here? Currently it can 'upload' by drag and drop, which will then auto populate the input textarea (like I explained the video doesn't show it well since it only records the browser and not the file explorer, but if you checkout this PR you'll see what I mean).

Are you suggesting something also like a button to click, bring up a file explorer type thing?

If so, we can decide the best layout for this (i.e. wireframes for the upload workflow), just don't want to make too many assumptions about how it should work

My original understanding/perspective of this feature was to have the file explorer pop up to upload the PDF. However, it should be okay to leave it like this but we will need to add a tooltip or some help instructions for this feature since it might not be as obvious... I am trying now to upload a PDF via drag and drop, and have received this error: image

Update: the PDF with 49 pages gave that error, however, the 15-page PDF worked. Is there a limit that we should give to the users in the instructions/tool tip?

henrygoodman commented 1 year ago

I think there may be filesize thresholds for uploading a file over http, not sure off the top of my head. I might just edit the Error popup message to handle if the file is too large (as I suspect is happening here).

If you still have the PR checked out, the specific error should have also been console.log-ged out in the browser, if you want to see if it is actually occurring due to filesize

I'll rebase and add that change in the morning (also add a proposed tooltip)

gracebilliris commented 1 year ago

I think there may be filesize thresholds for uploading a file over http, not sure off the top of my head. I might just edit the Error popup message to handle if the file is too large (as I suspect is happening here).

If you still have the PR checked out, the specific error should have also been console.log-ged out in the browser, if you want to see if it is actually occurring due to filesize

I'll rebase and add that change in the morning (also add a proposed tooltip)

I tried up to 40 pages and the drag-and-drop worked. The console error message said the same as the pop-up for the 50-page PDF: Error processing PDF. Please try again.

I think it might just be safest to say a maximum 40-page PDF - what do you think?

henrygoodman commented 1 year ago

I think it might just be safest to say a maximum 40-page PDF - what do you think?

The problem is that it is most likely not the number of pages causing the error but the amount of content. You can have 40 blank pages or 40 pages full of text, the latter will have hundreds time more information and increased filesize.

I'll troubleshoot in morning, and if it is a filesize threshold then I'll just determine what the maximum safe filesize is (probably around 20MB for example), and just notify the user they should upload files smaller than that (this is what a lot of the online PDF merger-type sites do already)

Also, the console.log should have said Error reading PDF: <actual error message>, it's possible the alert also gets logged in the console. EDIT: It's actually console.errorso it may appear in the error messages log

henrygoodman commented 1 year ago

@gracebilliris the error above seemed to just be a logical error, as if we attempt to parse a page with NO content on it at all, then we will get a null reference error when trying to access lastItem = page.contents.items[0]. Fixed by adding a nullcheck for lastItem, tested using a 50 page document, 200 page document (100k words) and uploaded fine in ~2 seconds.

Also, added tooltip: image