UAL-RE / LD-Cool-P

Python tool to enable data curation
MIT License
4 stars 5 forks source link

Feature: Retrieve Deposit Agreement PDF via API #187

Closed astrochun closed 3 years ago

astrochun commented 3 years ago

It may be possible to download without logging in using this method https://www.qualtrics.com/community/discussion/10362/is-this-possible-to-download-pdf-summary-of-responses-by-using-qualtrics-api-in-python

I tried the URL in a private browser window and it does display the result along with an option to generate a PDF.

_Originally posted by @zoidy in https://github.com/ualibraries/LD_Cool_P/issues/24#issuecomment-805068801_

astrochun commented 3 years ago

This is interesting. we'll do some experimentation. I had looked for it through the Qualtrics community before but didn't find a hit but that might have been before this was indexed by Google.

zoidy commented 3 years ago

After playing with this method in-depth, it seems that generating a PDF isn't possible, at least not in a light-weight manner (even downloading the HTML and trying to render it locally doesn't work, due to external dependencies). The generate PDF button generates the PDF using client-side Javascript. In order to programmatically download a PDF, we would need to use something like Selenium to load the page, click the button, and received the generated file.

astrochun commented 3 years ago

After playing with this method in-depth, it seems that generating a PDF isn't possible, at least not in a light-weight manner (even downloading the HTML and trying to render it locally doesn't work, due to external dependencies). The generate PDF button generates the PDF using client-side Javascript. In order to programmatically download a PDF, we would need to use something like Selenium to load the page, click the button, and received the generated file.

I was looking at wkhtmltopdf and couldn't get it to work on my Mac after installing it. So I think keeping external dependencies out is desirable. After some thought, I think I came up with an easy solution that would still provide some API functionality/automation while still being a bit interactive. My working idea is to have an input() response where we can copy/paste the link from the email that we get. This will then retrieve the PDF and save it in UAL_RDM folder with the name: "Deposit_Agreement.pdf" (we can include the depositor if need be). This would be the most straightforward solution. Thought?

zoidy commented 3 years ago

I agree that sounds like the most straightforward solution for now. A future thing to play with but is more complicated is accessing the email message via API and parsing out the link (the email contains critical info like article_id after all.)

astrochun commented 3 years ago

Interestingly the PDF return 400 error for those that are modified later. So each PDF link is specific to a version of the survey response and a retake means the previous PDF link no longer works.

zoidy commented 3 years ago

Interestingly the PDF return 400 error for those that are modified later. So each PDF link is specific to a version of the survey response and a retake means the previous PDF link no longer works.

Makes sense. The link looks suspiciously like a JWT auth token (or similar)

astrochun commented 3 years ago

Interestingly the PDF return 400 error for those that are modified later. So each PDF link is specific to a version of the survey response and a retake means the previous PDF link no longer works.

Makes sense. The link looks suspiciously like a JWT auth token (or similar)

I can see if I can figure out how they are generating them. If it is based on the responses, then maybe that's why it changes. I don't think we will be doing retake on Deposit Agreement (we haven't yet).