Data pipelines for extraction, transformation and visualization of architectural visuals in Python. It extracts images embedded in PDF files, collects relevant metadata, and extracts visual features using the DinoV2 model. We ambition to make of this package Ai-powered tool with features for recorgnizing different types architectural visuals (types of buildings, structures, etc.). The package is still in development and we are working on adding more features and improving the existing ones. If you have any suggestions or questions, please open an issue in our GitHub repository.
Viz: an utility to create a bounding box plot. This plot provides an overview of the shapes and sizes of images in a data set.
After installing the dependencies, install VisArchPy using pip
.
pip install visarchpy
git clone https://github.com/AiDAPT-A/VisArchPy.git
cd VisArchPy/
Install the package using pip
.
pip install .
Developers who intend to modify the sourcecode can install additional dependencies for test and documentation as follows.
Go to the root directory visarchpy/
Run:
pip install -e .[dev]
VisArchPy provides a command line interface to access its functionality. If you want to VisArchPy as a Python package consult the documentation.
visarch -h
visarch [PIPELINE] [SUBCOMMAND]
For example, to run the layout
pipeline using a single PDF file, do the following:
visarch layout from-file <path-to-pdf-file> <path-output-directory>
Use visarch [PIPELINE] [SUBCOMMAND] -h
for help.
Results from the data extraction pipelines (Layout, OCR, LayoutOCR) are save to the output directory. Results are organized as following:
00000/ # results directory
├── pdf-001 # directory where images are saved to. One per PDF file
├── 00000-metadata.csv # extracted metadata as CSV
├── 00000-metadata.json # extracted metadata as JSON
├── 00000-settings.json # settings used by pipeline
└── 00000.log # log file
The pipeline's settings determine how visual extraction from PDF files is performed. Settings must be passed as a JSON file on the CLI. Settings may must include all items listed below. The values showed belowed are the defaults.
\ When no seetings are passed to a pipeline, the defaults are used. To print the default seetting to the terminal use:
visarch [PIPELINE] settings
Please cite this software using as follows:
Garcia Alvarez, M. G., Khademi, S., & Pohl, D. (2023). VisArchPy [Computer software]. https://github.com/AiDAPT-A/VisArchPy