dfo-mar-odis / saraDataScraping

Repo to hold code and project management for the SARA data scraping project
MIT License
0 stars 0 forks source link

Extracting recovery measures information from Recovery Documents using Reproducible Analytical Pipelines and identifying best practices for future data entry and reporting

The project is focused on enabling SARP to test how to efficiently and accurately extract text from Recovery Documents, SARA workplans, and Progress Reports to automatically transfer the written text in these documents into designated cells in a spreadsheet to populate a final database. This workflow will then be used to develop workflows to explore the process of generating reproducible reports.

Outcomes of this project will include:

  1. Testing and evaluating the use of Reproducible Analytical Pipelines to automate data extraction from SARA recovery documents and Species at Risk geodatabase.
  2. Make recommendations regarding document elements (e.g. formatting or codes) required to identify and extract relevant information into a spreadsheet. This will coincide with the Species at Risk Program recovery and implementation team plans to revise Species' Progress Report templates and other recovery document templates this fiscal year.
  3. Conceptual workflow of elements that would be required to reverse-engineer outcomes 1 and 2 (above) by using forms, csv files, or Excel spreadsheets to generate reproducible reports (e.g. using Microsoft PowerBI, R Markdown, or other tools that are easily used and accessible to Species at Risk Program staff).

Handy Links:

Project Folder

Project Proposal

R Installation

install.packages("remotes")
remotes::install_deps()

Adding a package:

usethis::use_package("packageName")

After adding a new package, commit the updated description file to source control.

Python Installation

Create a virtual environment, in terminal: python -m venv venv

Activate it (source venv/bin/activate), or select it as the python interpreter under project options.

Install the needed packages: pip install -r requirements.txt