Closed pebabion closed 3 years ago
hi @kelvin1794 , thanks a lot for raising the issue. Unfortunately, I cannot reproduce it in the dev environment, so I need more info for debugging. Could you please show the content of this folder via CLI command:
databricks fs ls dbfs:/Shared/dbx/projects/databricks_pipelines/2dc5616b50a943dc96e014e06174abda/artifacts
and this one:
databricks fs ls dbfs:/Shared/dbx/projects/databricks_pipelines/2dc5616b50a943dc96e014e06174abda/artifacts/tests/integration
?
Hi @renardeinside , thank you so much for helping.
When I run
databricks fs ls dbfs:/Shared/dbx/projects/databricks_pipelines/2dc5616b50a943dc96e014e06174abda/artifacts
, this is the result:
.dbx
dist
tests
And when I run
databricks fs ls dbfs:/Shared/dbx/projects/databricks_pipelines/2dc5616b50a943dc96e014e06174abda/artifacts/tests/integration
, this is the result:
sample_test.py
Also, just a side note, when I installed the wheel file in the cluster, there seems to be a conflict, because when I run a notebook, all the cell results become "Cancelled".
@kelvin1794 , could you please try the following command:
databricks fs cat dbfs:/Shared/dbx/projects/databricks_pipelines/2dc5616b50a943dc96e014e06174abda/artifacts/tests/integration/sample_test.py
I have a guess that for some strange reason your user has no permission to read the file. Do you see the file content in the output?
@renardeinside , thanks for the prompt reply.
When I run it, the output is the content of the sample_test.py
file. Is there any possible reason you could think of? Is it some extra configuration I need to do with dbx
or something?
import unittest
from amgen_databricks_pipelines.jobs.sample.entrypoint import SampleJob
from uuid import uuid4
from pyspark.dbutils import DBUtils # noqa
class SampleJobIntegrationTest(unittest.TestCase):
def setUp(self):
self.test_dir = "dbfs:/tmp/tests/sample/%s" % str(uuid4())
self.test_config = {"output_format": "delta", "output_path": self.test_dir}
self.job = SampleJob(init_conf=self.test_config)
self.dbutils = DBUtils(self.job.spark)
self.spark = self.job.spark
def test_sample(self):
self.job.launch()
output_count = (
self.spark.read.format(self.test_config["output_format"])
.load(self.test_config["output_path"])
.count()
)
self.assertGreater(output_count, 0)
def tearDown(self):
self.dbutils.fs.rm(self.test_dir, True)
if __name__ == "__main__":
# please don't change the logic of test result checks here
# it's intentionally done in this way to comply with jobs run result checks
# for other tests, please simply replace the SampleJobIntegrationTest with your custom class name
loader = unittest.TestLoader()
tests = loader.loadTestsFromTestCase(SampleJobIntegrationTest)
runner = unittest.TextTestRunner(verbosity=2)
result = runner.run(tests)
if not result.wasSuccessful():
raise RuntimeError(
"One or multiple tests failed. Please check job logs for additional information."
)
I'm still not sure about the root cause tbh.
Could you please take a look at the run logs for run ***#job/2703892/run/1
?
Error log and log4j sections might provide some more clues.
Thanks @renardeinside . I guess I'll have to dig in, and will update if I can find the answer.
@renardeinside thank you. When I tested on a different Databricks instance, it worked fine. So I guess it was something to do with the previous Databricks environment. Thank you!
I am following the template with Databricks hosted on AWS and running with GitHub actions but bumped into the below error.
What could cause an error like this?
Cannot read the python file...
Appreciate any help. Thank you!