Closed martinschorb closed 2 years ago
We should also add a test for this, I think the best place is https://github.com/constantinpape/cluster_tools/blob/master/test/downscaling/test_downscaling.py.
Do you want to add signed test data to the archive, or would you prefer generating it?
My suggestion would be to use the hdf5 example, simply changing its dtype
and shifting its values.
My suggestion would be to use the hdf5 example, simply changing its dtype and shifting its values.
Yes, I agree that sounds like the best solution.
When I try to run an import it also fails for me.
import_data/utils.py:102, in downscale(in_path, in_key, out_path, resolution, scale_factors, chunks, tmp_folder, target, max_jobs, block_shape, library, library_kwargs, metadata_format, out_key, unit, source_name, roi_begin, roi_end, int_to_uint)
100 ret = luigi.build([t], local_scheduler=True)
101 if not ret:
--> 102 raise RuntimeError("Downscaling failed")
Is there a way to get a more detailed idea what exactly went wrong?
It's in the test log:
ERROR: [pid 2798] Worker Worker(salt=763965705, workers=1, host=fv-az196-745, username=runner, pid=2798) failed CopyVolumeLocal(tmp_folder=./tmp, max_jobs=2, config_dir=./tmp/config, input_path=/home/runner/work/cluster_tools/cluster_tools/test_data/sampleA.n5, input_key=volumes/raw/s0, output_path=./tmp/data.n5, output_key=data/s0, prefix=initial_scale, dtype=None, int_to_uint=False, fit_to_roi=False, effective_scale_factor=[], dimension_separator=None, dependency=DummyTask)
Traceback (most recent call last):
File "/usr/share/miniconda/envs/cluster_env/lib/python3.8/site-packages/luigi/worker.py", line 191, in run
new_deps = self._run_get_new_deps()
File "/usr/share/miniconda/envs/cluster_env/lib/python3.8/site-packages/luigi/worker.py", line 133, in _run_get_new_deps
task_gen = self.task.run()
File "/home/runner/work/cluster_tools/cluster_tools/cluster_tools/cluster_tasks.py", line 95, in run
raise e
File "/home/runner/work/cluster_tools/cluster_tools/cluster_tools/cluster_tasks.py", line 81, in run
self.run_impl()
File "/home/runner/work/cluster_tools/cluster_tools/cluster_tools/copy_volume/copy_volume.py", line 119, in run_impl
dtype = DTYPE_MAPPING.get(dtype, dtype)
UnboundLocalError: local variable 'dtype' referenced before assignment
OK. Runs now. Now to the tests...
- a check that makes sure that the
dtype
argument (which is available throughself.dtype
) is compatible with the resulting unsigned dtype ifself.dtype is not None and int_to_uint is True
As it is now, the dtype
argument is simply ignored when int_to_uint
is requested (dtype
gets overwritten by the unsigned data dtype
).
Do you like a warning issued if both arguments are given?
If we assume that the dtype provided by elf.io.open_file
is a numpy.dtype
The check for this being signed int should be sufficent.
If we assume that the dtype provided by elf.io.open_file is a numpy.dtype The check for this being signed int should be sufficent.
No this is more complex than you think (and also than I initially thought). See line 119. If dtype
is passed as an argument none of this will make sense. Please add a check in the beginning that raises an error if self.int_to_uint is True and self.dtype is not None
Doesn't in the end https://github.com/martinschorb/cluster_tools/blob/c05642dba9ec1ebc7b534c7edb03034ba8917eda/cluster_tools/copy_volume/copy_volume.py#L125 take over everthing before, and hence only ds.dtype
which is coming from elf
is relevant for being checked?
Doesn't in the end https://github.com/martinschorb/cluster_tools/blob/c05642dba9ec1ebc7b534c7edb03034ba8917eda/cluster_tools/copy_volume/copy_volume.py#L125 take over everthing before, and hence only ds.dtype which is coming from elf is relevant for being checked?
I overlooked this, but what you did there was not quite correct. You need to use dtype
instead of ds.dtype
there (I updated this already). The reason is that there are no guarantees that the dtype will be string represented as e.g.int32
, it could also be <i4
. elf doesn't implement any special checks for this it just returns the .dtype
of the underlying storage class, e.g. the h5py dataset, zarr array or z5py dataset. That's the whole reason for the dtype mapping happening here (which could even be more complete and there is hopefully a better way to do this I could adopt eventually, but that's out of scope now).
I went ahead and fixed the remaining issues (there were a couple. I can just encourage you again to use a linter: it helps tremendously to write correct python code because you will find many issues without needing to actually run the code) and added the test.
Also, I think that the code you used to translate between dtypes was wrong: It should be
(data-np.iinfo(data.dtype).min).astype(dtype)
instead of
(data-np.iinfo(data.dtype).min-1).astype(dtype)
see the screenshots below:
I ll go ahead and merge this.
Yes, for some reason i assumed that the minimum of int8
was -127
, hence I added the offset.
I don't know if you want to test for this in all potential output formats, or if we simply trust
numpy
on doing it right.