Serverside Rendering on Vercel fails; missing GLIBC_2.29

ItsMeBrianD commented 1 year ago

What happens?

When attempting to deploy some Javascript project to Vercel that leverages SSR and DuckDB; the build fails.

The error message being presented by DuckDB is /lib64/libm.so.6: version 'GLIBC_2.29' not found (required by /vercel/path0/node_modules/duckdb/lib/binding/duckdb.node.

This has worked previously.

To Reproduce

This repo has a simple reproduction of the issue; simply create a vercel project based on this (or a fork), and the build will fail with the error message https://github.com/ItsMeBrianD/duckdb-vercel-repro

OS:

Vercel

DuckDB Version:

0.7.1

DuckDB Client:

node

Full Name:

Brian Donald

Affiliation:

Evidence

Have you tried this on the latest `master` branch?

[X] I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

[X] I agree

archiewood commented 1 year ago

@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?

Mause commented 1 year ago

@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?

Which version does it work with? We can check for changes

tobilg commented 1 year ago

As Vercel is running on AWS Lambda as far as I know, I'm having a hard time imagining that this has worked before, as Lambda environments are currently based on Amazon Linux 2, which uses GLIBC 2.26. See https://repost.aws/questions/QUrXOioL46RcCnFGyELJWKLw/glibc-2-27-on-amazon-linux-2

I guess you could download my DuckDB for Lambda layer, and extract the build artifacts: https://github.com/tobilg/duckdb-nodejs-layer#arns

pgzmnk commented 1 year ago

Experiencing similar error on Vercel with both node 18.x and 16.x.

https://github.com/pgzmnk/openb

tobilg commented 1 year ago

I therefor created https://www.npmjs.com/package/duckdb-lambda-x86 which should solve the actual issue.

Mause commented 1 year ago

@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?

Which version does it work with? We can check for changes

@archiewood any updates?

hanshino commented 1 year ago

I've encountered the same problem as described. Specifically, I'm using duckdb@0.7.1.

Environment:

Operating System: Ubuntu 22.02 and Mac M1 Sonoma
Encountered inside a Docker container
Docker Base Image: node:14

Steps to Reproduce:

   docker run --rm -it node:14 bash

In node:14 container

   mkdir app && cd app
   yarn init -y
   yarn add duckdb@0.7.1
   cd node_modules/duckdb
   npm test

Are there any necessary packages that I need to install?

Tranlated by ChatGPT.

Sorry for my english is not good. I hope there's no offense.

tobilg commented 1 year ago

@hanshino the default duckdb npm package will not work IMO due to GLIBC incompatibilities, as described above. For Lambda usage, I maintain the https://www.npmjs.com/package/duckdb-lambda-x86 package which should fix your issues.

ryan-williams commented 11 months ago

Here's a wrapper over duckdb-async and duckdb-lambda-x86 that I just wrote, which seems to work both on my M1 macbook (which requires duckdb-async) and on an EC2 instance where I was previously hitting the GLIBC_2.29 error (where duckdb-lambda-x86 works instead):

// lib/duckdb.ts
let _query: Promise<(query: string) => any>

_query = import("duckdb-async")
    .then(duckdb => duckdb.Database)
    .then(Database => Database.create(":memory:"))
    .then((db: any) => ((query: string) => db.all(query)))
    .catch(async error => {
        console.log("duckdb init error:", error)
        let duckdb = await import("duckdb-lambda-x86");
        let Database: any = await duckdb.Database;
        const db = new Database(":memory:")
        const connection = db.connect()
        return (query: string) => {
            return new Promise((resolve, reject) => {
                connection.all(query, (err: any, res: any) => {
                    if (err) reject(err);
                    resolve(res);
                })
            })
        }
    })

export { _query }

Sample API endpoint that uses it:

// /api/query.ts
import { _query } from "@/lib/duckdb"
import { NextApiRequest, NextApiResponse } from "next";

// Convert BigInts to numbers
function replacer(key: string, value: any) {
    if (typeof value === 'bigint') {
        return Number(value)
    } else {
        return value;
    }
}

export default async function handler(
    req: NextApiRequest,
    res: NextApiResponse,
) {
    const { body: { path } } = req
    const query = await _query
    const rows = await query(`select * from read_parquet("${path}")`)  // 🚨 unsafe / SQLi 🚨
    res.status(200).send(JSON.stringify(rows, replacer))
}

michaelwallabi commented 8 months ago

FYI for others who run into this. I ended up using @tobilg's duckdb-lambda-x86 to resolve this with Vercel. In my case I'm just replacing the default duckdb.node binary with the duckdb-lambda-x86 version in the CI build.

iku000888 commented 2 months ago

@michaelwallabi Thank you for the tip - replacing the binary at build/deploy time was by far the most ergonomic solution (and the only one that I was able to work for my project). I want to extend my sincere appreciation to @tobilg for the effort that enabled it in the first place as well.

Ideally running duck db in a lambda should be easy out of the box as it is a great use case, so I look forward to future releases that don't require hacks/workarounds.

Dev-rick commented 1 month ago

Even with replacing the binaries, I am getting the following issue on version 1.0.0. (I am on Vercel, Nodejs 20)

Unhandled Rejection: [Error: IO Error: Can't find the home directory at '' Specify a home directory using the SET home_directory='/path/to/dir' option.] { errno: -1, code: 'DUCKDB_NODEJS_ERROR', errorType: 'IO' }

Setting a homedirectory does also result in an error: Error: TypeError: Failed to set configuration option home_directory: Invalid Input Error: Could not set option "home_directory" as a global option at new Database (/var/task/node_modules/duckdb-async/dist/duckdb-async.js:226:19)

Can anyone help me please? Thank you!

iku000888 commented 1 month ago

@Dev-rick this worked for me on aws lambda!

https://github.com/tobilg/serverless-duckdb/blob/87ad3c5d1bbbb8e03a80e6ad943da53c3a556a21/src/functions/query.ts#L73

michaelwallabi commented 1 month ago

Like @iku000888, I do the following when creating a DB, which seems to work:

    const db = Database.create(":memory:");
    let tempDirectory = tmpdir() || '/tmp';
    await (await db).exec(`
        SET home_directory='${tempDirectory}';
        .... other settings here
        `);

Dev-rick commented 1 month ago

@iku000888 and @michaelwallabi Thanks for the input!

Unfortunately I am now getting the following error (on Vercel), on local everything works fine with the same env variables.

Error: HTTP Error: HTTP GET error on 'https://XXX.s3.amazonaws.com/XXX.parquet' (HTTP 400)] { errno: -1, code: 'DUCKDB_NODEJS_ERROR', errorType: 'HTTP' }

My code is:

const S3_LAKE_BUCKET_NAME = process.env.S3_LAKE_BUCKET_NAME
const AWS_S3_ACCESS_KEY = process.env['AWS_S3_ACCESS_KEY']
const AWS_S3_SECRET_KEY = process.env['AWS_S3_SECRET_KEY']
const AWS_S3_REGION = process.env['AWS_S3_REGION']

const retrieveDataFromParquet = async ({
  key,
  sqlStatement,
  tableName,
}: {
  key: string
  sqlStatement: string
  tableName: string
}) => {
  try {
    // Create a new DuckDB database connection
    const db = await Database.create(':memory:')

    console.log('Setting home directory...')
    await db.all(`SET home_directory='/tmp';`)

    console.log('Installing and loading httpfs extension...')
    await db.all(`
      INSTALL httpfs;
      LOAD httpfs;
    `)

    console.log('Setting S3 credentials...')
    await db.all(`
      SET s3_region='${AWS_S3_REGION}';
      SET s3_access_key_id='${AWS_S3_ACCESS_KEY}';
      SET s3_secret_access_key='${AWS_S3_SECRET_KEY}';
    `)

    // Test S3 access
    console.log('Testing S3 access...')
    try {
      const testResult = await db.all(`
        SELECT * FROM parquet_metadata('s3://${S3_LAKE_BUCKET_NAME}/${key}');
      `)
      console.log('S3 access test result successfully loaded:')
    } catch (s3Error) {
      console.error('Error testing S3 access:', s3Error)
      throw s3Error // Rethrow the error to stop execution
    }

    // Try to read file info without actually reading the file
    console.log('Checking file info...')
    try {
      const fileInfo = await db.all(`
        SELECT * FROM parquet_scan('s3://${S3_LAKE_BUCKET_NAME}/${key}') LIMIT 0;
      `)
      console.log('File info loaded')
    } catch (fileError) {
      console.error('Error checking file info:', fileError)
    }

    // If everything above works, try creating the table
    console.log('Creating table...')
    await db.all(
      `CREATE TABLE ${tableName} AS SELECT * FROM parquet_scan('s3://${S3_LAKE_BUCKET_NAME}/${key}');`,
    )

    console.log('Table created successfully')

    // Execute the query
    const result = db.all(sqlStatement)

    // Close the database connection
    db.close()

    // Send the result
    return result as unknown as Promise<{ [k: string]: any }[]>
  } catch (error) {
    console.error('Error:', error)
    return null
  }
}

tobilg commented 1 month ago

Have a look at my implementation at https://github.com/tobilg/serverless-duckdb/blob/main/src/lib/awsSecret.ts and triggering https://github.com/tobilg/serverless-duckdb/blob/main/src/functions/queryS3Express.ts#L95 before any access to S3.

Hint: IMO you also need to pass the SESSION_TOKEN and eventually the ENDPOINT as well if you're using S3 One-Zone Express.

I'm wondering why you're seeing a 400 status (invalid request), and not a 403 status though.

tobilg commented 1 month ago

@michaelwallabi Thank you for the tip - replacing the binary at build/deploy time was by far the most ergonomic solution (and the only one that I was able to work for my project). I want to extend my sincere appreciation to @tobilg for the effort that enabled it in the first place as well.

Thank you, appreciate the feedback!

Ideally running duck db in a lambda should be easy out of the box as it is a great use case, so I look forward to future releases that don't require hacks/workarounds.

This is honestly not a "fault" from DuckDB, but from AWS using very outdated GLIBC versions in any Node runtimes before Node 20 (see https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html#runtimes-supported), as Node 20 now uses AL 2023 which has a updated GLIBC that should work with the normal duckdb-node package as well afaik.

iku000888 commented 1 month ago

This is honestly not a "fault" from DuckDB, but from AWS using very outdated GLIBC versions in any Node runtimes before Node 20 (see https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html#runtimes-supported), as Node 20 now uses AL 2023 which has a updated GLIBC that should work with the normal duckdb-node package as well afaik.

Oh hm that is interesting. I thought I was running my lambdas on Node 20 and was getting ELF errors, so either AL 2023 still has issues or I'm not on Node 20 🤔

duckdb / duckdb-node