bcgov / entity

ServiceBC Registry Team working on Legal Entities
Apache License 2.0
23 stars 58 forks source link

Download PDFs into memory instead of onto disk #14701

Closed MatthewCai2002 closed 1 year ago

MatthewCai2002 commented 1 year ago

Currently COLIN screen scraper downloads PDFs onto local disk.

Want this changed so that PDFs are downloaded into memory for faster data transfer.

start with this person's solution and go off from there: https://stackoverflow.com/questions/64618229/using-python-selenium-to-download-a-file-in-memory-not-in-disk

MatthewCai2002 commented 1 year ago

doing this through selenium might be a bit tricky, an idea is to use requests library and Beautiful soup handle download requests which will give more control over these downloads.

MatthewCai2002 commented 1 year ago

testing whether request library can be used to cache PDFs in memory

MatthewCai2002 commented 1 year ago

requests library auto caches downloads into memory just writing them into a pdf for now as doc storage in modern app gets built