DataSeer / snapshot-api

snapshot-api
0 stars 0 forks source link

Snapshot API

The Snapshot API allow processing of PDF documents through a verification system in respect of the OSI (Open Science Indicators). This project provides a Node.js REST API that implements JWT authentication and integrates with the DataSeer AI "Genshare" API for PDF processing. It features user-specific rate limiting, script-based user management, and secure token handling.

Table of Contents

  1. Features
  2. Prerequisites
  3. Installation
  4. Usage
  5. Error Handling
  6. GenShare Response
  7. Authentication
  8. Project Structure
  9. Configuration Files
  10. Rate Limiting
  11. Logging System
  12. Security Considerations
  13. Contributing
  14. License

Features

Prerequisites

Installation

Using Docker

  1. Clone the repository:

    git clone https://github.com/DataSeer/snapshot-api.git
    cd snapshot-api
  2. Build image:

    docker build -t snapshot-api .
  3. Run container:

    # using default conf & env files
    docker run -d -it -p 3000:3000 --network host --name snapshot-api-instance snapshot-api
    
    # using custom conf & env files
    docker run -d -it -p 3000:3000 --network host --name snapshot-api-instance -v $(pwd)/.env:/usr/src/app/.env -v $(pwd)/conf:/usr/src/app/conf snapshot-api
  4. Interact with the container:

    # using default conf & env files
    docker exec -it snapshot-api-instance /bin/bash

Direct Installation

  1. Clone the repository:

    git clone https://github.com/DataSeer/snapshot-api.git
    cd snapshot-api
  2. Install dependencies:

    npm install
  3. Set up configuration:

    • Create conf/genshare.json with your Genshare API details:
      {
      "processPDF": {
        "url": "http://localhost:5000/process/pdf",
        "method": "POST",
        "apiKey": "your_genshare_api_key_for_process_pdf"
      }
      }
    • The conf/users.json file will be created automatically when you add users.
  4. Set environment variables:

    • PORT: The port on which the server will run (default: 3000)
    • JWT_SECRET: Secret key for JWT token generation and validation

Usage

Starting the Server

To start the server in production mode :

npm start

Managing Users

Use the following command to manage users:

npm run manage-users <command> [userId] [options]

Commands:

Examples:

# Add a new user with custom rate limit
npm run manage-users add user123 '{"max": 200, "windowMs": 900000}'

# Refresh a user's token
npm run manage-users refresh-token user123

# Update a user's rate limit
npm run manage-users update-limit user123 '{"max": 300}'

# List all users
npm run manage-users list

# Remove a user
npm run manage-users remove user123

Rate limits are specified as a JSON object with max (maximum number of requests) and windowMs (time window in milliseconds) properties. If not specified when adding a user, it defaults to 100 requests per 15-minute window.

API Endpoints

All API endpoints require authentication using a JWT token.

For all requests, include the JWT token in the Authorization header:

Authorization: Bearer <your_token>

Example curl commands:

  1. Get API information:

    curl -H "Authorization: Bearer <your_token>" http://localhost:3000/
  2. Process a PDF with options:

    curl -X POST -H "Authorization: Bearer <your_token>" \
     -F "file=@path/to/your/file.pdf" \
     -F 'options={"key":"value","anotherKey":123}' \
     http://localhost:3000/processPDF

Note: Ensure that the options parameter is a valid JSON object. Invalid JSON will result in an error response.

Error Handling

"file" errors

HTTP 400: 'Required "file" missing' (parameter not set)

curl -X POST -H "Authorization: Bearer <your_token>" \
     -F 'options={"key":"value","anotherKey":123}' \
     http://localhost:3000/processPDF
# HTTP 400 Bad Request 
Required "file" missing

HTTP 400: 'Required "file" invalid. Must have mimetype "application/pdf".' (file with incorrect mimetype)

curl -X POST -H "Authorization: Bearer <your_token>" \
     -F "file=@path/to/your/file.xml" \
     -F 'options={"key":"value","anotherKey":123}' \
     http://localhost:3000/processPDF
# HTTP 400 Bad Request 
Required "file" invalid. Must have mimetype "application/pdf".

"options" errors

HTTP 400: 'Required "options" missing.' (parameter not set)

curl -X POST -H "Authorization: Bearer <your_token>" \
     -F "file=@path/to/your/file.pdf" \
     http://localhost:3000/processPDF
# HTTP 400 Bad Request 
Required "options" missing.

HTTP 400: 'Required "options" invalid. Must be a valid JSON object.' (data are not JSON)

curl -X POST -H "Authorization: Bearer <your_token>" \
     -F "file=@path/to/your/file.pdf" \
     -F 'options="key value anotherKey 123"' \
     http://localhost:3000/processPDF
# HTTP 400 Bad Request 
Required "options" invalid. Must be a valid JSON object.

HTTP 400: 'Required "options" invalid. Must be a JSON object.' (data are JSON but not an object)

curl -X POST -H "Authorization: Bearer <your_token>" \
     -F "file=@path/to/your/file.pdf" \
     -F 'options=["key","value","anotherKey",123]' \
     http://localhost:3000/processPDF
# HTTP 400 Bad Request 
Required "options" invalid. Must be a JSON object.

GenShare Response

More info available here

Authentication

The API uses JSON Web Tokens (JWT) for authentication, implemented through several components:

Token Management

Token Lifecycle

  1. Creation: Tokens are generated using:

    npm run manage-users add <userId>

    This creates a JWT signed with the application's secret key containing the user ID.

  2. Usage: Include token in requests:

    curl -H "Authorization: Bearer <your_token>" http://localhost:3000/endpoint
  3. Validation: Each request is authenticated by:

    • Extracting token from Authorization header
    • Verifying JWT signature
    • Looking up associated user
    • Checking rate limits
  4. Renewal: Refresh expired tokens using:

    npm run manage-users refresh-token <userId>

User Management Commands

# Generate new user with token
npm run manage-users add user123

# List all users and their tokens
npm run manage-users list

# Refresh token for existing user
npm run manage-users refresh-token user123

# Remove user and invalidate token
npm run manage-users remove user123

Security Features

Project Structure

Configuration Files

The application uses two main configuration files:

Make sure to keep these files secure and do not commit them to version control.

Rate Limiting

This API implements user-specific rate limiting:

Rate limiting is implemented in src/utils/rateLimiter.js and can be further customized as needed.

Logging System

The API implements comprehensive logging using Winston and Morgan:

Log Format

Log Analysis

The project includes a log analysis script that provides detailed statistics about API usage:

# analyze log/combined.log file
npm run analyze-logs

# analyze a given log file
node scripts/analyze_logs.js [path/to/logfile]

The analyzer provides:

Example output:

Request Statistics:

USERS Statistics:
User: user123
  Total Requests: 150
  Successful Requests: 145
  Overall Success Rate: 96.67%
  URL Breakdown:
    URL: /processPDF
      Total Requests: 120
      Successful Requests: 118
      Success Rate: 98.33%

IPS Statistics:
IP: 192.168.1.1
  Total Requests: 75
  Successful Requests: 70
  Overall Success Rate: 93.33%
  ...

Security Considerations

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.