NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.16k stars 14.19k forks source link

Apache beam GCP would be awesome to have #167913

Open blackgnezdo opened 2 years ago

blackgnezdo commented 2 years ago

Project description

I was happy to find that apache beam is already packaged. Sadly, trying to run an example quickly resulted in a missing runtime dependency:

 ❯ nix-channel --add https://nixos.org/channels/nixpkgs-unstable unstable
 ❯ nix-channel --update
 ❯ nix-shell -p '(import <unstable> {}).python39Packages.apache-beam'

[nix-shell:~]$ python3 ~/beam/examples/wordcount.py --output /tmp/x

/nix/store/v264lg3apzdj0zpc4xb6zg29a17mcxnw-python3.9-apache-beam-2.35.0/lib/python3.9/site-packages/apache_beam/__init__.py:79: UserWarning: This version of Apache Beam has not been sufficiently tested on Python 3.9. You may encounter bugs or missing features.
  warnings.warn(
INFO:root:Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner.
Traceback (most recent call last):
  File "/home/gnezdo/beam/examples/wordcount.py", line 94, in <module>
    run()
  File "/home/gnezdo/beam/examples/wordcount.py", line 73, in run
    lines = p | 'Read' >> ReadFromText(known_args.input)
  File "/nix/store/v264lg3apzdj0zpc4xb6zg29a17mcxnw-python3.9-apache-beam-2.35.0/lib/python3.9/site-packages/apache_beam/io/textio.py", line 666, in __init__
    self._source = self._source_class(
  File "/nix/store/v264lg3apzdj0zpc4xb6zg29a17mcxnw-python3.9-apache-beam-2.35.0/lib/python3.9/site-packages/apache_beam/io/textio.py", line 128, in __init__
    super().__init__(
  File "/nix/store/v264lg3apzdj0zpc4xb6zg29a17mcxnw-python3.9-apache-beam-2.35.0/lib/python3.9/site-packages/apache_beam/io/filebasedsource.py", line 124, in __init__
    self._validate()
  File "/nix/store/v264lg3apzdj0zpc4xb6zg29a17mcxnw-python3.9-apache-beam-2.35.0/lib/python3.9/site-packages/apache_beam/options/value_provider.py", line 193, in _f
    return fnc(self, *args, **kwargs)
  File "/nix/store/v264lg3apzdj0zpc4xb6zg29a17mcxnw-python3.9-apache-beam-2.35.0/lib/python3.9/site-packages/apache_beam/io/filebasedsource.py", line 185, in _validate
    match_result = FileSystems.match([pattern], limits=[1])[0]
  File "/nix/store/v264lg3apzdj0zpc4xb6zg29a17mcxnw-python3.9-apache-beam-2.35.0/lib/python3.9/site-packages/apache_beam/io/filesystems.py", line 203, in match
    filesystem = FileSystems.get_filesystem(patterns[0])
  File "/nix/store/v264lg3apzdj0zpc4xb6zg29a17mcxnw-python3.9-apache-beam-2.35.0/lib/python3.9/site-packages/apache_beam/io/filesystems.py", line 103, in get_filesystem
    raise ValueError(
ValueError: Unable to get filesystem from specified path, please use the correct path or ensure the required dependency is installed, e.g., pip install apache-beam[gcp]. Path specified: gs://dataflow-samples/shakespeare/kinglear.txt

Some place else pip install apache-beam[gcp] would suffice, but this is clearly not a Nix way.

Metadata

a-h commented 1 year ago

I was looking for a solution myself, the folks on the forum assisted, and I now have a working shell.nix that can use GCP (at least for accessing cloud storage, which is what I tested):

https://discourse.nixos.org/t/equivalent-of-pip-install-apache-beam-gcp/25377/5

Not exactly straightforward, and I couldn't work it out on my own without help, but it does work.

a-h commented 1 year ago

Just to add complexity, it doesn't work on HEAD at the moment, see https://github.com/NixOS/nixpkgs/issues/212691

Fortunately with a Nix Flake you can set the nixpkgs to a version that does work.

flake.nix

{
  description = "GCP Dataflow setup";

  inputs.nixpkgs.url = "github:NixOS/nixpkgs/35f1f865c03671a4f75a6996000f03ac3dc3e472";
  inputs.flake-utils.url = "github:numtide/flake-utils";

  outputs = { self, nixpkgs, flake-utils }:
    flake-utils.lib.eachDefaultSystem (system:
      let
        pkgs = nixpkgs.legacyPackages.${system};
        beam = import ./beam.nix { inherit pkgs; };
        shell = pkgs.mkShell {
          packages = [
            pkgs.gcc-unwrapped.lib
            pkgs.poetry
            beam.beam
          ];
        };
      in
      {
        beam = beam.beam;
        # nix shell
        defaultPackage = beam.beam;
        # nix develop
        devShells.default = shell;
      });
}

beam.nix

{ pkgs ? import <nixpkgs> {} }:

{
 beam = pkgs.python39.withPackages(ps: with ps; [ 
    # [gcp] optionals.
    cachetools
    google-apitools
    google-auth
    google-auth-httplib2
    google-cloud-datastore
    google-cloud-pubsub
    # google-cloud-pubsublite - Not found
    google-cloud-bigquery
    google-cloud-bigquery-storage
    google-cloud-core
    google-cloud-bigtable
    google-cloud-spanner
    google-cloud-dlp
    google-cloud-language
    google-cloud-videointelligence
    google-cloud-vision
    # google-cloud-recommendations-ai - Not found
    # End of [gcp] section.
    google-cloud-storage
    apache-beam
    grpcio
  ]);
}

Then you can do nix shell to add the package to your shell, or nix develop to start a new shell with the right packages in it.