dask / dask-yarn

Deploy dask on YARN clusters
http://yarn.dask.org
BSD 3-Clause "New" or "Revised" License
69 stars 41 forks source link

can't upload files #149

Closed xxxxsk closed 3 years ago

xxxxsk commented 3 years ago

When I was deploying my project, I can't import func from my *.py, and there is a FileNotFoundError after client.upload_file('')

The directory structure dask_test

test.py:

from dask_yarn import YarnCluster
from dask.distributed import Client
import time
import pandas as pd
import os

cluster = YarnCluster.from_current()
client = Client(cluster)

def square(x):
    return x ** 2
ts = time.time()
client.upload_file('upload_file.py')
A = client.map(square, range(5))
total = client.submit(sum, A)
print(total.result())
print('1cost time :%s'%(time.time()-ts))
client.close()
cluster.close()

I want to konw how to correctly upload file and import ?

Environment:

quasiben commented 3 years ago

I don't think dask-yarn can submit both a dependent file and the script to run. I'd recommend bundling all of scripts into one file. Alternatively, you could install the library in an environment to be bundled with the application: https://yarn.dask.org/en/latest/environments.html#managing-python-environments

xxxxsk commented 3 years ago

I upload my files to hdfs and it works