Closed lukemarsden closed 5 years ago
possibly because the python lib doesn't detect it as a directory if it's a mountpoint?
According to the setup of that project, the dataset mount point is roadsigns
not data
.
The run metadata says that it claims to have read data
from the workspace dot:
[
{
"run_id": "e0bdee53-4da4-45f2-86ae-6f29ef51c2f6",
"authority": 0,
"description": "pretended to do some data science with my data",
"workload_file": "test.py",
"success": true,
"workspace_input_files": [
{
"filename": "data",
"version": "02274bf7-fdaf-4cc8-ac5c-5e535a0f1070"
}
],
"workspace_output_files": [
"fake-model.mdl"
],
"exec_start": "2019-07-26T13:37:09.967989Z",
"exec_end": "2019-07-26T13:37:09.96835Z"
}
]
This is consistent - the data was at roadsigns
, so data
was nonexistant and a path in the workspace. Presumably the run actually failed on that basis, although it didn't return a nonzero error code as the system thinks it succeeded?
I updated the script to look for roadsigns
and ran it in the jupyterlab terminal and got this output:
import dotscience as ds; ds.script()
ds.input("roadsigns")
open(ds.output("fake-model.mdl"), "w").write("hehe")
ds.publish("pretended to do some data science with my data")
[[DOTSCIENCE-RUN:9a794bb8-5d16-49c5-969f-c9ddb1ee03b5]]{
"description": "pretended to do some data science with my data",
"end": "20190729T161248.363008",
"input": [
"roadsigns/roadsigns.p",
"roadsigns/signnames.csv"
],
"labels": {},
"output": [
"fake-model.mdl"
],
"parameters": {},
"start": "20190729T161248.362405",
"summary": {},
"version": "1",
"workload-file": "test.py"
}[[/DOTSCIENCE-RUN:9a794bb8-5d16-49c5-969f-c9ddb1ee03b5]]
That's correct! So the dotscience python library is performing as expected.
So: Is ds run
not setting up the dataset mount correctly, I wonder?
...nope, that seems right, here's the same through ds run
:
51544-06-04 21:32:13.000 Z: You have not called ds.start() yet, so I'm doing it for you!
51544-06-04 21:32:14.000 Z: [[DOTSCIENCE-RUN:cdc6b60e-10d2-4b7a-9a5d-9909ab19bcb9]]{
51544-06-04 21:32:14.000 Z: "description": "pretended to do some data science with my data",
51544-06-04 21:32:14.000 Z: "end": "20190729T162907.932689",
51544-06-04 21:32:14.000 Z: "input": [
51544-06-04 21:32:14.000 Z: "roadsigns/roadsigns.p",
51544-06-04 21:32:14.000 Z: "roadsigns/signnames.csv"
51544-06-04 21:32:14.000 Z: ],
51544-06-04 21:32:14.000 Z: "labels": {},
51544-06-04 21:32:14.000 Z: "output": [
51544-06-04 21:32:14.000 Z: "fake-model.mdl"
51544-06-04 21:32:14.000 Z: ],
51544-06-04 21:32:14.000 Z: "parameters": {},
51544-06-04 21:32:14.000 Z: "start": "20190729T162907.931883",
51544-06-04 21:32:14.000 Z: "summary": {},
51544-06-04 21:32:14.000 Z: "version": "1",
51544-06-04 21:32:14.000 Z: "workload-file": "test.py"
51544-06-04 21:32:14.000 Z: }[[/DOTSCIENCE-RUN:cdc6b60e-10d2-4b7a-9a5d-9909ab19bcb9]]
AFAICT the problem was just that the script was looking for data
when it's at roadsigns
...
i did
ds.input("data")
wheredata
is the mountpoint of an S3 dataset, expecting it to recursively add everything inside, instead, i just got this:on prod, using latest ds curled from the get.dotscience.com just now