LibertyDSNP / parquetjs

Fully asynchronous, pure JavaScript implementation of the Parquet file format with additional features
MIT License
55 stars 25 forks source link

[HELP REQUESTED] Creating a parquet file in a chrome extension does not work #146

Open YannHulot opened 1 month ago

YannHulot commented 1 month ago

This issue is kind of a message in a bottle type of issue, I don't really think that there is anything wrong with the dependency but I am looking for help. If it's not the right place, then by all means, please close the issue.

Basically, I am trying to create a Parquet file through a browser extension. Browser extensions do not have access to the Node APIs so I opted to use the browser version of this dependency.

Steps to reproduce

The extension is written in typescript.

This is my tsconfig:

{
  "compilerOptions": {
    "outDir": "./dist/",
    "sourceMap": true,
    "module": "esnext",
    "moduleResolution": "node",
    "target": "ES2020",
    "incremental": true,
    "strict": true /* Enable all strict type-checking options. */,
    "noImplicitAny": true /* Raise error on expressions and declarations with an implied 'any' type. */,
    "strictNullChecks": true /* Enable strict null checks. */,
    "strictFunctionTypes": true /* Enable strict checking of function types. */,
    "strictBindCallApply": true /* Enable strict 'bind', 'call', and 'apply' methods on functions. */,
    "strictPropertyInitialization": true /* Enable strict checking of property initialization in classes. */,
    "noImplicitThis": true /* Raise error on 'this' expressions with an implied 'any' type. */,
    "alwaysStrict": true /* Parse in strict mode and emit "use strict" for each source file. */ /* Additional Checks */,
    "noUnusedLocals": true /* Report errors on unused locals. */,
    "noUnusedParameters": true /* Report errors on unused parameters. */,
    "noImplicitReturns": true /* Report error when not all code paths in function return a value. */,
    "noFallthroughCasesInSwitch": true /* Report errors for fallthrough cases in switch statement. */,
    "noUncheckedIndexedAccess": true /* Include 'null' in index signature results */,
    "noImplicitOverride": true /* Ensure overriding members in derived classes are marked with an 'override' modifier. */,
    "noPropertyAccessFromIndexSignature": true,
    "lib": [
      "dom",
      "dom.iterable",
      "ESNext",
    ],
    "allowJs": true,
    "allowSyntheticDefaultImports": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true,
    "esModuleInterop": true,
    "resolveJsonModule": true,
    "isolatedModules": true,
    "downlevelIteration": true,
    "jsx": "react-jsx"
  },
  "include": [
    "./src/**/*"
  ],
  "exclude": [
    "./node_modules/**/*",
    "./dist/**/*",
    "./build/**/*",
    "./public/**/*",
    "./test-files/**/*",
  ]
}

This is my current webpack config:

const BufferPlugin = new webpack.ProvidePlugin({
  process: 'process/browser',
  Buffer: ['buffer', 'Buffer'],
  react: 'React',
})

const config = {
  mode: 'production',
  entry: {
    content: path.join(__dirname, 'src/content.ts'),
    background: path.join(__dirname, 'src/background.ts'),
  },
  output: { path: path.join(__dirname, 'dist'), filename: '[name].js', publicPath: '' },
  module: {
    rules: [
      {
        // Match `.js`, `.jsx`, `.ts` or `.tsx` files
        test: /\.[jt]sx?$/,
        loader: 'esbuild-loader',
        options: {
          // JavaScript version to compile to
          target: 'es2020',
        },
      },
      {
        test: /\.css$/,
        use: ['style-loader', 'css-loader'],
        exclude: /\.module\.css$/,
      },
      {
        test: /\.css$/,
        use: [
          'style-loader',
          {
            loader: 'css-loader',
            options: {
              importLoaders: 1,
              modules: true,
            },
          },
        ],
        include: /\.module\.css$/,
      },
      {
        test: /\.svg$/,
        use: 'file-loader',
      },
      {
        test: /\.png$/,
        use: [
          {
            loader: 'url-loader',
            options: {
              mimetype: 'image/png',
            },
          },
        ],
      },
    ],
  },
  optimization: {
    minimize: true,
    minimizer: [
      new EsbuildPlugin({
        target: 'es2020',
        css: true,
      }),
    ],
  },
  resolve: {
    extensions: ['.js', '.jsx', '.tsx', '.ts'],
    alias: {
      process: 'process/browser',
      '@dsnp/parquetjs': path.resolve(
        __dirname,
        'node_modules',
        '@dsnp',
        'parquetjs',
        'dist',
        'browser',
        'parquet.cjs.js'
      ),
    },
    fallback: {
      'process/browser': require.resolve('process/browser'),
      stream: require.resolve('stream-browserify'),
    },
  },
  devServer: {
    contentBase: './dist',
  },
  plugins: [
    new CopyPlugin({
      patterns: [{ from: 'public', to: '.' }],
    }),
    new MiniCssExtractPlugin(),
    new NodePolyfillPlugin(),
    BufferPlugin,
  ],
  devtool: false,
}

The dependencies versions' are:

I have file where I import the dependency and its types as such:

import parquet, { ParquetSchema } from '@dsnp/parquetjs'
import { ParquetType } from '@dsnp/parquetjs/dist/lib/declare'

and then call the openFile function as such:

const downloadParquetFile = async (data: any) => {
  // other non relevant code

  // generate the filename
  const fileName :string  = randomFileNameGenerator()

  console.log('before open')

  // Create a Parquet writer
  const writer = await parquet.ParquetWriter.openFile(schema, fileName)

  console.log('after open')
}

Filename is just a random name.

I have a website with the same functionality so I know that my implementation is correct. After doing a bit of digging, I think the issue is related to the fs dependency which in the browser is replaced by zenfs if I am correct.

Expected behaviour

I am expecting the data to be added to the file.

Actual behaviour

Nothing happens, my best guess it that the stream never opens and therefore the promise in the function osopen never resolves.

export const osopen = function (path: string | Buffer | URL, opts?: WriterOptions): Promise<WriteStream> {
  return new Promise((resolve, reject) => {
    const outputStream = fs.createWriteStream(path, opts);

    outputStream.on('open', function (_fd) {
      resolve(outputStream);
    });

    outputStream.on('error', function (err) {
      reject(err);
    });
  });
};

I can see the first console.log output from the example code I provided above but not the second one which should have been called once the stream was open.

Any logs, error output, etc?

No errors, no logs, the process juts hangs indefinitely.

...

Any other comments?

I realize this is a bit of a long shot and that my use case is very niche but I am open to trying anything to make this work. ...

wilwade commented 1 month ago

Hmm... Ok. I don't know much about React Native (been a while since I worked with it).

My understanding of what you would need to do (might be wrong in a few ways):

YannHulot commented 1 month ago

Hmm... Ok. I don't know much about React Native (been a while since I worked with it).

My understanding of what you would need to do (might be wrong in a few ways):

* Use something like https://blog.logrocket.com/how-to-access-file-systems-react-native/ to make general file system operations working.

* Use a require alias to map the zenfs dependency to one of the react native compatible interfaces. Since zenfs is the same interface as node fs, any node fs compatible React Native package should work.

  * https://webpack.js.org/configuration/resolve/#resolvealias

Thanks for your response.

The Chrome extension is written in pure React + Typescript not React Native.

I will take a look at the link related to aliasing the fs dependency and report back if it works.

wilwade commented 1 month ago

Possible the new release might fix it as well if it was an issue in zenfs that was updated: https://github.com/LibertyDSNP/parquetjs/releases/tag/v1.8.4