bacalhau-project / bacalhau

Compute over Data framework for public, transparent, and optionally verifiable computation
https://docs.bacalhau.org
Apache License 2.0
671 stars 87 forks source link

Where did this job save its stuff? #4247

Open aronchick opened 1 month ago

aronchick commented 1 month ago

There's no details about the output?


❯ bacalhau job describe j-acb8621a
ID            = j-acb8621a-99f7-408e-9e00-ed03592b7dcf
Name          = Run Over Share
Namespace     = science
Type          = batch
State         = Completed
Count         = 1
Created Time  = 2024-07-20 01:00:25
Modified Time = 2024-07-20 01:01:57
Version       = 0

Summary
Completed = 1

Job History
 TIME                 REV.  STATE      TOPIC       EVENT
 2024-07-20 01:00:25  1     Pending    Submission  Job submitted
 2024-07-20 01:00:27  2     Running
 2024-07-20 01:01:57  3     Completed

Executions
 ID          NODE ID     STATE      DESIRED  REV.  CREATED     MODIFIED  COMMENT
 e-a4072a7c  n-3ede3924  Completed  Stopped  6     10m40s ago  9m9s ago  Accepted job

Execution e-a4072a7c History
 TIME                 REV.  STATE              TOPIC            EVENT
 2024-07-20 01:00:25  1     New
 2024-07-20 01:00:25  2     AskForBid
 2024-07-20 01:00:27  3     AskForBidAccepted  Requesting Node  Accepted job
 2024-07-20 01:00:27  4     AskForBidAccepted
 2024-07-20 01:00:27  5     BidAccepted
 2024-07-20 01:01:57  6     Completed

❯ bacalhau job describe j-acb8621a --help
Full description of a job, in yaml format. Use 'bacalhau job list' to get a list of jobs.

Here's the job spec:


❯ more jobs/template_job.yaml
Name: Run Over Share
Namespace: science
Type: batch
Count: 1
Tasks:
  - Name: Run over share
    InputSources:
      - Source:
          Type: localDirectory
          Params:
            SourcePath: /mnt/azureshare
        Target: /azureshare
    Publisher:
      Type: local
    Engine:
      Type: docker
      Params:
        # the docker container that will download videos and perform inference
        Image: docker.io/bacalhauproject/python-runner:2024.07.19.1745
        EnvironmentVariables:
          - COMMAND={{.fulltext}}
          - B64_ENCODED=True
          - FILE_PATH=/azureshare/spliced_blc0001020304050607_guppi_57532_10225_HIP56445_0029.gpuspec.0000.h5
          - DEBUG1=True
    Resources:
      # dependent on compute nodes, this is based on e2-standard-8
      CPU: "4"
      Memory: "16GB"
      Disk: "16GB"
wdbaruni commented 1 month ago
    Publisher:
      Type: local

The default destination is the result staying in the local compute node and can be accessed directly from the compute node. As we don't support tunneling through the requester node, then the compute node must be reachable by the client as well.

The local publisher is only intended for testing purposes, though we need a better default publisher without the local publisher's limitations

wdbaruni commented 1 month ago

Though your question was "Where did this job save its stuff". To help us make right decisions as we evaluate alternatives to local publisher:

  1. how is that important to the user?
  2. if bacalhau job get just works, does it matter to the user where the result is stored
  3. do we think a default publisher will cause confusions?
  4. Instead of encoding a default publisher, should we just fail jobs if they defined outputs without a publisher, and list the available publishers in the network