The Snapshot API allow processing of PDF documents through a verification system in respect of the OSI (Open Science Indicators). This project provides a Node.js REST API that implements JWT authentication and integrates with the DataSeer AI "Genshare" API for PDF processing. It features user-specific rate limiting, script-based user management, and secure token handling.
Clone the repository:
git clone https://github.com/DataSeer/snapshot-api.git
cd snapshot-api
Build image:
docker build -t snapshot-api .
Run container:
# using default conf & env files
docker run -d -it -p 3000:3000 --network host --name snapshot-api-instance snapshot-api
# using custom conf & env files
docker run -d -it -p 3000:3000 --network host --name snapshot-api-instance -v $(pwd)/.env:/usr/src/app/.env -v $(pwd)/conf:/usr/src/app/conf snapshot-api
Interact with the container:
# using default conf & env files
docker exec -it snapshot-api-instance /bin/bash
Clone the repository:
git clone https://github.com/DataSeer/snapshot-api.git
cd snapshot-api
Install dependencies:
npm install
Set up configuration:
conf/genshare.json
with your Genshare API details:
{
"processPDF": {
"url": "http://localhost:5000/process/pdf",
"method": "POST",
"apiKey": "your_genshare_api_key_for_process_pdf"
}
}
conf/users.json
file will be created automatically when you add users.Set environment variables:
PORT
: The port on which the server will run (default: 3000)JWT_SECRET
: Secret key for JWT token generation and validationTo start the server in production mode :
npm start
Use the following command to manage users:
npm run manage-users <command> [userId] [options]
Commands:
add [userId] [rateLimit]
: Add a new userremove <userId>
: Remove a userrefresh-token <userId>
: Refresh a user's tokenupdate-limit <userId> <rateLimit>
: Update a user's rate limitlist
: List all usersExamples:
# Add a new user with custom rate limit
npm run manage-users add user123 '{"max": 200, "windowMs": 900000}'
# Refresh a user's token
npm run manage-users refresh-token user123
# Update a user's rate limit
npm run manage-users update-limit user123 '{"max": 300}'
# List all users
npm run manage-users list
# Remove a user
npm run manage-users remove user123
Rate limits are specified as a JSON object with max
(maximum number of requests) and windowMs
(time window in milliseconds) properties. If not specified when adding a user, it defaults to 100 requests per 15-minute window.
All API endpoints require authentication using a JWT token.
GET /
: Get information about available API routes
POST /processPDF
: Process a PDF file
file
: PDF fileoptions
: JSON string of processing optionsoptions
parameter must be a valid JSON object. If it's not well-formed or is not a valid JSON object, the API will return a 400 Bad Request error.For all requests, include the JWT token in the Authorization header:
Authorization: Bearer <your_token>
Example curl commands:
Get API information:
curl -H "Authorization: Bearer <your_token>" http://localhost:3000/
Process a PDF with options:
curl -X POST -H "Authorization: Bearer <your_token>" \
-F "file=@path/to/your/file.pdf" \
-F 'options={"key":"value","anotherKey":123}' \
http://localhost:3000/processPDF
Note: Ensure that the options
parameter is a valid JSON object. Invalid JSON will result in an error response.
file
parameter is not provided, a 400 Bad Request error is returned.
options
parameter is not provided or not a valid JSON object, a 400 Bad Request error is returned with a descriptive message.
HTTP 400: 'Required "file" missing' (parameter not set)
curl -X POST -H "Authorization: Bearer <your_token>" \
-F 'options={"key":"value","anotherKey":123}' \
http://localhost:3000/processPDF
# HTTP 400 Bad Request
Required "file" missing
HTTP 400: 'Required "file" invalid. Must have mimetype "application/pdf".' (file with incorrect mimetype)
curl -X POST -H "Authorization: Bearer <your_token>" \
-F "file=@path/to/your/file.xml" \
-F 'options={"key":"value","anotherKey":123}' \
http://localhost:3000/processPDF
# HTTP 400 Bad Request
Required "file" invalid. Must have mimetype "application/pdf".
HTTP 400: 'Required "options" missing.' (parameter not set)
curl -X POST -H "Authorization: Bearer <your_token>" \
-F "file=@path/to/your/file.pdf" \
http://localhost:3000/processPDF
# HTTP 400 Bad Request
Required "options" missing.
HTTP 400: 'Required "options" invalid. Must be a valid JSON object.' (data are not JSON)
curl -X POST -H "Authorization: Bearer <your_token>" \
-F "file=@path/to/your/file.pdf" \
-F 'options="key value anotherKey 123"' \
http://localhost:3000/processPDF
# HTTP 400 Bad Request
Required "options" invalid. Must be a valid JSON object.
HTTP 400: 'Required "options" invalid. Must be a JSON object.' (data are JSON but not an object)
curl -X POST -H "Authorization: Bearer <your_token>" \
-F "file=@path/to/your/file.pdf" \
-F 'options=["key","value","anotherKey",123]' \
http://localhost:3000/processPDF
# HTTP 400 Bad Request
Required "options" invalid. Must be a JSON object.
The API uses JSON Web Tokens (JWT) for authentication, implemented through several components:
TokenManager
: Handles token storage and validationUserManager
: Manages user data and updatesconf/users.json
separate from user data for securityCreation: Tokens are generated using:
npm run manage-users add <userId>
This creates a JWT signed with the application's secret key containing the user ID.
Usage: Include token in requests:
curl -H "Authorization: Bearer <your_token>" http://localhost:3000/endpoint
Validation: Each request is authenticated by:
Renewal: Refresh expired tokens using:
npm run manage-users refresh-token <userId>
# Generate new user with token
npm run manage-users add user123
# List all users and their tokens
npm run manage-users list
# Refresh token for existing user
npm run manage-users refresh-token user123
# Remove user and invalidate token
npm run manage-users remove user123
JWT_SECRET
environment variable)src/
: Contains the main application code
server.js
: Entry pointconfig.js
: Configuration managementmiddleware/
: Custom middleware (e.g., authentication)routes/
: API route definitionscontrollers/
: Request handling logicutils/
: Utility functions and classesscripts/
: Contains the user management scriptconf/
: Configuration files
genshare.json
: Genshare API configurationusers.json
: User data storage (managed by scripts)tmp/
: folder containing temporary filesThe application uses two main configuration files:
conf/genshare.json
: Contains configuration for the Genshare API integration.conf/users.json
: Stores user data, including tokens and rate limits.Make sure to keep these files secure and do not commit them to version control.
This API implements user-specific rate limiting:
max
: Maximum number of requests allowed in the time windowwindowMs
: Time window in millisecondswindowMs
to 0 when adding or updating the userRate limiting is implemented in src/utils/rateLimiter.js
and can be further customized as needed.
The API implements comprehensive logging using Winston and Morgan:
The project includes a log analysis script that provides detailed statistics about API usage:
# analyze log/combined.log file
npm run analyze-logs
# analyze a given log file
node scripts/analyze_logs.js [path/to/logfile]
The analyzer provides:
Example output:
Request Statistics:
USERS Statistics:
User: user123
Total Requests: 150
Successful Requests: 145
Overall Success Rate: 96.67%
URL Breakdown:
URL: /processPDF
Total Requests: 120
Successful Requests: 118
Success Rate: 98.33%
IPS Statistics:
IP: 192.168.1.1
Total Requests: 75
Successful Requests: 70
Overall Success Rate: 93.33%
...
users.json
and genshare.json
) are not committed to version controlPlease read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the MIT License - see the LICENSE.md file for details.