exaxorg / accelerator

The Accelerator is a tool for fast and reproducible processing of eBay-scale datasets on a single computer.
https://exax.org
Apache License 2.0
4 stars 1 forks source link

Very cryptic error message when a job passed as a dataset has a dataset which is not named default #2

Open pabloyoyoista opened 3 months ago

pabloyoyoista commented 3 months ago

Let's move this from our own accelerator! One of the things where there is still great improvement potential for the accelerator is in error reporting, and this is one of those small things that help new users get acquainted faster.

In method X:

def prepare(job):
  job.datasetwriter(name="something")

In method Y:

datasets=('source',)

And then the build script:

job_X = urd.build("X")
job_Y = urd.build("Y", source=job_X)

it will fail with the following error message: accelerator.error.NoSuchDatasetError: Dataset 'sofya.krainova-12160' does not exist.

The actual problem here is that job_X does not have a dataset named default. This should hopefully be identifiable, and we could provide a better error. Maybe even, since the job has only one single dataset, it would probably be possible to identify it and not error out in this case? Although that requires further discussion.

drougge commented 3 months ago

The error message could certainly be better, suggesting the only available dataset or listing them if there are several but not too many.

Choosing one you didn't ask for does not sound good to me, it will just give you more confusion down the line.

pabloyoyoista commented 3 months ago

The error message could certainly be better, suggesting the only available dataset or listing them if there are several but not too many.

That's also a sensible option. Might also help people learn about the difference between jobs and datasets