PipedreamHQ / pipedream

Connect APIs, remarkably fast. Free for developers.
https://pipedream.com
Other
8.78k stars 5.27k forks source link

[FEATURE] Enable users to supply a package.json for a workflow #12165

Open andrewjschuang opened 3 months ago

andrewjschuang commented 3 months ago

Describe the solution you'd like I would like to provide a package.json and package-lock.json file with the following behavior:

If I supply package.json or package-lock.json, I don't want Pipedream to automatically install latest packages from the import declarations in my Node.js steps.

joscha commented 3 months ago

I reported this issue via support, however it is labeled as a feature request here. I believe it unfortunately needs to be categorized as a major security issue in the Pipedream ecosystem.

Especially with how short-lived and sporadic some workflows are executed, no control over whether the code has network access, etc. it would be close to impossible to determine after the fact whether an attack has happened and what data was affected.

I'd love a statement on https://pipedream.com/docs/privacy-and-security as well as to the current design and what it means for data compliance as it seems to me as if steps/workflows have a very obvious supply chain issue that is opaque to the user (I, as a user of Pipedream can't actually see what the dependency graph / versions are executed at all, let alone control it).

Why does controlling the dependency graph matter?

Controlling the complete dependency graph of an npm dependency closure is crucial because it ensures security, stability, and reliability of the software by preventing malicious code, minimizing vulnerabilities, and avoiding unexpected disruptions due to changes or removals in dependencies.

Common problems with specifying a dependency without any version and/or without controlling the whole dependency graph are:

  1. Unpredictable Updates: The Pipedream workflow will automatically use the latest version of the package, which might introduce breaking changes or new bugs.
  2. Compatibility Issues: New versions might not be compatible with existing code in steps, leading to runtime errors or failures.
  3. Security Vulnerabilities: Automatic updates can inadvertently include versions with unpatched security vulnerabilities.
  4. Dependency Conflicts: Other dependencies might require specific versions, leading to conflicts and resolution issues.
  5. Lack of Reproducibility: Runs of steps (and thus workflows) can become inconsistent, making it difficult to reproduce and debug issues.

Examples

To give three prominent examples where this happened in the recent past on a massive scale:

Malicious Code in ua-parser-js (2021)

In October 2021, versions of the popular npm package ua-parser-js were found to contain malicious code. This package, widely used for parsing user-agent data, was compromised to include malware that allowed attackers to gain control over infected systems and steal sensitive information.

Event-Stream Package Incident (2018)

The event-stream package was tampered with to include malicious code via the flatmap-stream dependency, targeting cryptocurrency wallets like Copay to steal Bitcoin.

Left-Pad Incident (2016)

The left-pad package was unpublished by its author, causing widespread disruption in the JavaScript ecosystem due to its use as a dependency in many projects.

(this inicdent should not reoccur as npm now made it impossible to unpublish public packages that are a dependency, but it shows nicely how a super simple and mundane leaf-package of the dependency tree caused massive disruption)

Workaround

Whilst I think that fixing this issue out of the box is crucial, one possible (but mildly tedious) workaround comes to mind:

For each package used in any Pipedream workflow, create a proxy package with a package-lock.json, e.g. if you depend on axios you could create @my-scope/axios, define axios as the dependency, reexport all symbols and lock the graph. If you then refer to this package with a fixed version @my-scope/axios@0.0.1 theoretically it should only pull in the locked graph.

There are a few assumptions being made: