MetadataExtractor is a web service built on Flask for extracting metadata as RDF triples from various file types. This service utilizes a REST API for receiving files and returning metadata in multiple formats.
installDependencies.sh
file.
requirements.txt
file.To install MetadataExtractor, follow these steps:
Clone the repository:
git clone https://github.com/BenediktHeinrichs/MetadataExtractor.git
cd MetadataExtractor
Run the installDependencies.sh
script to install required dependencies & Python packages (Linux):
./installDependencies.sh
If you have Docker installed, you can build and run the service using the provided Dockerfile
.
To start the service:
Using Python directly:
python server.py
Using Docker:
docker build -t metadataextractor .
docker run -p 36541:36541 metadataextractor
defaultConfigs.py
module.
setDefaultLogging()
function.MAX_CONTENT_LENGTH
, METADATA_EXTRACTOR_HOST
, and METADATA_EXTRACTOR_PORT
can be adjusted as needed.Current API version is defined by the __version__
attribute within the MetadataExtractor
module.
The service exposes several endpoints:
identifier
: A unique identifier for the file.config
: Configuration object for extraction settings. (Example value: { "Extractors": { "Text": [ "SummaryExtract" ] } }
)creation_date
: File's creation date.modification_date
: File's modification date.url
: Download URL of the file.file
: The file to be processed.accept
: The Accept header has to be set (default is JSON, recommended is Turtle)The server uses defined response models to structure the JSON response. This includes the MetadataOutput
model for the main endpoint and the Version
model for the version endpoint.
Contributions are welcome, check out the Contribution guidelines! Please feel free to submit a pull request.
pip install ruff
ruff --fix .
ruff format .