augustin-wien / PDF-Parser

This plugin is used in a wordpress backend to extract text from pdf files and convert it into blogs.
1 stars 0 forks source link

PDF-Parser

A software to parse print-ready newspaper PDF's. \ After extracting articles of each page, each article is published via the Wordpress API.

Installation guide

LocalWP

Install\ First, for a local development setup you need to install LocalWP.

Create website\ Second, you need to create a local wordpress website to be able to publish posts via the Parser.\ Here is a detailed How-To-Video for more information on LocalWP.\ Important: Remember your username and URL since these credentials are needed for the parser to communicate.

Generate Application Password\ Third, in your Wordpress Backend navigate to Users > Profile. Scroll down to the “Application Passwords” heading.\ Add a new Application Password and copy your new password to a safe place like your Password Manager.\ Screenshot Detailed instructions can be read here and watched here

Create several categories\ In your wordpress backend, click on the tab 'Posts' to the below submenu 'Categories'.\ Now create the following categories step by step without the quotation marks:\ "editorial","augustiner:in","einsicht","das wahre leben","cover","cover","cover","cover","tun & lassen","tun & lassen","tun & lassen","tun & lassen","vorstadt","vorstadt","lokalmatador:in nº","vorstadt","art.ist.in","art.ist.in","art.ist.in","art.ist.in","dichter innenteil","dichter innenteil","dichter innenteil","dichter innenteil","augustinchen","augustinchen"

PDF-Parser

Clone\ Clone this repository on your local machine.

Install required packages\ We assume you have Python installed and use a virtual environment like venv.\ In your virtual environment, run:

pip install -r requirements.txt

Create .env file\ In the main directory copy .env.example into .env\ Then change all the credentials to your specific needs as such:

Start the app\ Next, run in your terminal:

uvicorn main:app --reload

Visit the app\ In your browser visit localhost:8000

Upload your PDF\ In your browser upload your PDF file and check the results on your local Wordpress site.\ GUI of PDF-Parser

Development

VSCode extensions

For development we use VSCode. Further, to keep the code style consistent we use the following VSCode extensions:

Please install all of these extensions.\ In the next step please open your VSCode user settings i.e. your settings.json via CTRL + SHIFT + P and click on Preferences: Open User Settings (JSON).\ In this file please add the following code below:

  // Python specific settings
  "[python]": {
    "editor.formatOnType": true,
    // Set the python formatter to black
    "editor.defaultFormatter": "ms-python.black-formatter"
  },
  // Python linter settings
  "pylint.args": ["--max-line-length=120"],
  "flake8.args": ["--max-line-length=120"]

Update requirements.txt

In case you add a new package for this project, first install pipreqs with:

pip install pipreqs

Being in the project directory and to overwrite the current requirement.txt run

pipreqs ./ --force

Important sidenote\ Pipreqs uses the package fitz but for our CI to run adequate, we need to specify PyMuPDF.\ So please undo the change in this line afterwards.

Finally, check the rest of the file via git to make sure everything is fine and push it.

Linting locally

To run everything as it happens in the Github Actions and fix it afterwards locally, do the following.

Pylint To run pylint, use this command:

pylint $(git ls-files '*.py') --max-line-length=120

If pylint is not installed, run:

pip install pylint

Flake8 To run flake8, use this command:

flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

If flake8 is not installed, run:

pip install flake8

Black To run black formatter, use this command:

black --check .

If black is not installed, run:

pip install black

Testing locally

To run everything as it happens in the Github Actions and fix it afterwards locally, do the following.

Pytest To run pytest, use this command:

pytest -v

If pytest is not installed, run:

pip install pytest