Describe the bug
When a dataset (out of dataset_tools.preprocess) has zero chunks to run over, dataset_tools.apply_to_fileset will raise FileNotFoundError on {}. I think this usually happens when some other more serious thing has gone wrong at first but I occasionally see people run into this and it's not immediately obvious what happened.
To Reproduce
the following makes use of a fix for #1140 to run:
import uproot
import awkward as ak
from coffea import dataset_tools
from coffea.nanoevents import BaseSchema
import dask
with uproot.recreate("f1.root") as f:
f["tree"] = {"arr": ak.Array([])}
with uproot.recreate("f2.root") as f:
f["tree"] = {"arr": ak.Array([1])}
fileset = {"dummy": {"files": {"f1.root": "tree"}}}
# fileset = {"dummy": {"files": {"f1.root": "tree", "f2.root": "tree"}}} # this works
samples, _ = dataset_tools.preprocess(fileset)
tasks = dataset_tools.apply_to_fileset(lambda evts: None, samples, schemaclass=BaseSchema)
_ = dask.compute(tasks)
Expected behavior
A warning along the lines of "no useable files found for dataset xyz" and no exception raised by default.
Output
Traceback (most recent call last):
File "[...]]/test.py", line 17, in <module>
tasks = dataset_tools.apply_to_fileset(lambda evts: None, samples, schemaclass=BaseSchema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[...]/coffea/dataset_tools/apply_processor.py", line 125, in apply_to_fileset
dataset_out = apply_to_dataset(
^^^^^^^^^^^^^^^^^
File "[...]/coffea/dataset_tools/apply_processor.py", line 73, in apply_to_dataset
).events()
^^^^^^^^
File "[...]/coffea/nanoevents/factory.py", line 684, in events
events = self._mapping(form_mapping=self._schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[...]/uproot/_dask.py", line 183, in dask
files = uproot._util.regularize_files(files, steps_allowed=True, **options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[...]/uproot/_util.py", line 946, in regularize_files
raise _file_not_found(files)
FileNotFoundError: file not found
{}
Files may be specified as:
* str/bytes: relative or absolute filesystem path or URL, without any colons
other than Windows drive letter or URL schema.
Examples: "rel/file.root", "C:\abs\file.root", "http://where/what.root"
* str/bytes: same with an object-within-ROOT path, separated by a colon.
Example: "rel/file.root:tdirectory/ttree"
* pathlib.Path: always interpreted as a filesystem path or URL only (no
object-within-ROOT path), regardless of whether there are any colons.
Examples: Path("rel:/file.root"), Path("/abs/path:stuff.root")
Functions that accept many files (uproot.iterate, etc.) also allow:
* glob syntax in str/bytes and pathlib.Path.
Examples: Path("rel/*.root"), "/abs/*.root:tdirectory/ttree"
* dict: keys are filesystem paths, values are objects-within-ROOT paths.
Example: {"/data_v1/*.root": "ttree_v1", "/data_v2/*.root": "ttree_v2"}
* already-open TTree objects.
* iterables of the above.
Desktop (please complete the following information):
n/a
Describe the bug When a dataset (out of
dataset_tools.preprocess
) has zero chunks to run over,dataset_tools.apply_to_fileset
will raiseFileNotFoundError
on{}
. I think this usually happens when some other more serious thing has gone wrong at first but I occasionally see people run into this and it's not immediately obvious what happened.To Reproduce the following makes use of a fix for #1140 to run:
Expected behavior A warning along the lines of "no useable files found for dataset xyz" and no exception raised by default.
Output
Desktop (please complete the following information): n/a
Additional context n/a