Look into the available packages and code lanaguages available for parsing data out of files and evaluate their suitability for this workflow. Possible candidates include: python docx library, R libraries: docxtractr, and R-crawler.
Why is this important
There are multiple options to acomplish the desired data scraping so reviewing the options should ensure better results in the end product.
Additional Context
Any solution will likely need to work for both PDF and Word documents.
Proposed Change/Activity
Look into the available packages and code lanaguages available for parsing data out of files and evaluate their suitability for this workflow. Possible candidates include: python docx library, R libraries: docxtractr, and R-crawler.
Why is this important
There are multiple options to acomplish the desired data scraping so reviewing the options should ensure better results in the end product.
Additional Context
Any solution will likely need to work for both PDF and Word documents.