brainglobe / cellfinder

Automated 3D cell detection in very large images
https://brainglobe.info/documentation/cellfinder/index.html
BSD 3-Clause "New" or "Revised" License
178 stars 38 forks source link

[BUG] Keras's incompatibility with `numpy>=2` breaks `cellfinder`'s model training #446

Closed alessandrofelder closed 2 days ago

alessandrofelder commented 1 month ago

Describe the bug

When I try to train a model with cellfinder napari's Training widget, I get a keras-related error:

AttributeError: `np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead.

Which is likely because of a reported incompatibility between keras and numpy 2.

Full stack trace ```bash File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/superqt/utils/_qthreading.py:613, in create_worker..reraise(e=AttributeError('`np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead.')) 612 def reraise(e): --> 613 raise e e = AttributeError('`np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead.') File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/superqt/utils/_qthreading.py:175, in WorkerBase.run(self=) 173 warnings.filterwarnings("always") 174 warnings.showwarning = lambda *w: self.warned.emit(w) --> 175 result = self.work() self = 176 if isinstance(result, Exception): 177 if isinstance(result, RuntimeError): 178 # The Worker object has likely been deleted. 179 # A deleted wrapped C/C++ object may result in a runtime 180 # error that will cause segfault if we try to do much other 181 # than simply notify the user. File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/superqt/utils/_qthreading.py:354, in FunctionWorker.work(self=) 353 def work(self) -> _R: --> 354 return self._func(*self._args, **self._kwargs) self._func = self = self._args = (TrainingDataInputs(yaml_files=(PosixPath('/home/alessandro/dev/training.yml'),), output_directory=PosixPath('/home/alessandro')), OptionalNetworkInputs(trained_model=None, model_weights=None, model_depth='50', pretrained_model='resnet50_tv'), OptionalTrainingInputs(continue_training=False, augment=True, tensorboard=False, save_weights=False, save_checkpoints=True, save_progress=True, epochs=100, learning_rate=0.0001, batch_size=16, test_fraction=0.1), MiscTrainingInputs(number_of_free_cpus=2)) self._kwargs = {} File ~/dev/cellfinder/cellfinder/napari/train/train.py:29, in run_training(training_data_inputs=TrainingDataInputs(yaml_files=(PosixPath('/home/..., output_directory=PosixPath('/home/alessandro')), optional_network_inputs=OptionalNetworkInputs(trained_model=None, model_...model_depth='50', pretrained_model='resnet50_tv'), optional_training_inputs=OptionalTrainingInputs(continue_training=False, ...ng_rate=0.0001, batch_size=16, test_fraction=0.1), misc_training_inputs=MiscTrainingInputs(number_of_free_cpus=2)) 21 @thread_worker 22 def run_training( 23 training_data_inputs: TrainingDataInputs, (...) 26 misc_training_inputs: MiscTrainingInputs, 27 ): 28 print("Running training") ---> 29 train_yml( train_yml = training_data_inputs = TrainingDataInputs(yaml_files=(PosixPath('/home/alessandro/dev/training.yml'),), output_directory=PosixPath('/home/alessandro')) optional_network_inputs = OptionalNetworkInputs(trained_model=None, model_weights=None, model_depth='50', pretrained_model='resnet50_tv') optional_training_inputs = OptionalTrainingInputs(continue_training=False, augment=True, tensorboard=False, save_weights=False, save_checkpoints=True, save_progress=True, epochs=100, learning_rate=0.0001, batch_size=16, test_fraction=0.1) misc_training_inputs = MiscTrainingInputs(number_of_free_cpus=2) 30 **training_data_inputs.as_core_arguments(), 31 **optional_network_inputs.as_core_arguments(), 32 **optional_training_inputs.as_core_arguments(), 33 **misc_training_inputs.as_core_arguments(), 34 ) 35 print("Finished!") File ~/dev/cellfinder/cellfinder/core/train/train_yml.py:431, in run(output_dir=PosixPath('/home/alessandro'), yaml_file=(PosixPath('/home/alessandro/dev/training.yml'),), n_free_cpus=2, trained_model=None, model_weights=PosixPath('/home/alessandro/.brainglobe/cellfinder/models/resnet50_tv.h5'), install_path=PosixPath('/home/alessandro/.brainglobe/cellfinder/models'), model=, network_depth='50', learning_rate=0.0001, continue_training=False, test_fraction=0.1, batch_size=16, no_augment=False, tensorboard=False, save_weights=False, no_save_checkpoints=False, save_progress=True, epochs=100) 426 else: 427 filepath = str( 428 output_dir / ("model" + base_checkpoint_file_name + ".keras") 429 ) --> 431 checkpoints = ModelCheckpoint( filepath = '/home/alessandro/model-epoch.{epoch:02d}-loss-{val_loss:.3f}.keras' save_weights = False 432 filepath, 433 save_weights_only=save_weights, 434 ) 435 callbacks.append(checkpoints) 437 if save_progress: File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/keras/src/callbacks/model_checkpoint.py:173, in ModelCheckpoint.__init__(self=, filepath='/home/alessandro/model-epoch.{epoch:02d}-loss-{val_loss:.3f}.keras', monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', save_freq='epoch', initial_value_threshold=None) 171 self.monitor_op = np.less 172 if self.best is None: --> 173 self.best = np.Inf self.best = None self = np = 175 if self.save_freq != "epoch" and not isinstance(self.save_freq, int): 176 raise ValueError( 177 f"Unrecognized save_freq: {self.save_freq}. " 178 "Expected save_freq are 'epoch' or integer values" 179 ) File ~/mambaforge/envs/cellfinder-py311/lib/python3.11/site-packages/numpy/__init__.py:397, in __getattr__(attr='Inf') 394 raise AttributeError(__former_attrs__[attr]) 396 if attr in __expired_attributes__: --> 397 raise AttributeError( attr = 'Inf' __expired_attributes__ = {'geterrobj': 'Use the np.errstate context manager instead.', 'seterrobj': 'Use the np.errstate context manager instead.', 'cast': 'Use `np.asarray(arr, dtype=dtype)` instead.', 'source': 'Use `inspect.getsource` instead.', 'lookfor': "Search NumPy's documentation directly.", 'who': 'Use an IDE variable explorer or `locals()` instead.', 'fastCopyAndTranspose': 'Use `arr.T.copy()` instead.', 'set_numeric_ops': 'For the general case, use `PyUFunc_ReplaceLoopBySignature`. For ndarray subclasses, define the ``__array_ufunc__`` method and override the relevant ufunc.', 'NINF': 'Use `-np.inf` instead.', 'PINF': 'Use `np.inf` instead.', 'NZERO': 'Use `-0.0` instead.', 'PZERO': 'Use `0.0` instead.', 'add_newdoc': "It's still available as `np.lib.add_newdoc`.", 'add_docstring': "It's still available as `np.lib.add_docstring`.", 'add_newdoc_ufunc': "It's an internal function and doesn't have a replacement.", 'compat': "There's no replacement, as Python 2 is no longer supported.", 'safe_eval': 'Use `ast.literal_eval` instead.', 'float_': 'Use `np.float64` instead.', 'complex_': 'Use `np.complex128` instead.', 'longfloat': 'Use `np.longdouble` instead.', 'singlecomplex': 'Use `np.complex64` instead.', 'cfloat': 'Use `np.complex128` instead.', 'longcomplex': 'Use `np.clongdouble` instead.', 'clongfloat': 'Use `np.clongdouble` instead.', 'string_': 'Use `np.bytes_` instead.', 'unicode_': 'Use `np.str_` instead.', 'Inf': 'Use `np.inf` instead.', 'Infinity': 'Use `np.inf` instead.', 'NaN': 'Use `np.nan` instead.', 'infty': 'Use `np.inf` instead.', 'issctype': 'Use `issubclass(rep, np.generic)` instead.', 'maximum_sctype': 'Use a specific dtype instead. You should avoid relying on any implicit mechanism and select the largest dtype of a kind explicitly in the code.', 'obj2sctype': 'Use `np.dtype(obj).type` instead.', 'sctype2char': 'Use `np.dtype(obj).char` instead.', 'sctypes': 'Access dtypes explicitly instead.', 'issubsctype': 'Use `np.issubdtype` instead.', 'set_string_function': 'Use `np.set_printoptions` instead with a formatter for custom printing of NumPy objects.', 'asfarray': 'Use `np.asarray` with a proper dtype instead.', 'issubclass_': 'Use `issubclass` builtin instead.', 'tracemalloc_domain': "It's now available from `np.lib`.", 'mat': 'Use `np.asmatrix` instead.', 'recfromcsv': 'Use `np.genfromtxt` with comma delimiter instead.', 'recfromtxt': 'Use `np.genfromtxt` instead.', 'deprecate': 'Emit `DeprecationWarning` with `warnings.warn` directly, or use `typing.deprecated`.', 'deprecate_with_doc': 'Emit `DeprecationWarning` with `warnings.warn` directly, or use `typing.deprecated`.', 'disp': 'Use your own printing function instead.', 'find_common_type': 'Use `numpy.promote_types` or `numpy.result_type` instead. To achieve semantics for the `scalar_types` argument, use `numpy.result_type` and pass the Python values `0`, `0.0`, or `0j`.', 'round_': 'Use `np.round` instead.', 'get_array_wrap': '', 'DataSource': "It's still available as `np.lib.npyio.DataSource`.", 'nbytes': 'Use `np.dtype().itemsize` instead.', 'byte_bounds': "Now it's available under `np.lib.array_utils.byte_bounds`", 'compare_chararrays': "It's still available as `np.char.compare_chararrays`.", 'format_parser': "It's still available as `np.rec.format_parser`."} __expired_attributes__[attr] = 'Use `np.inf` instead.' 398 f"`np.{attr}` was removed in the NumPy 2.0 release. " 399 f"{__expired_attributes__[attr]}" 400 ) 402 if attr == "chararray": 403 warnings.warn( 404 "`np.chararray` is deprecated and will be removed from " 405 "the main namespace in the future. Use an array with a string " 406 "or bytes dtype instead.", DeprecationWarning, stacklevel=2) AttributeError: `np.Inf` was removed in the NumPy 2.0 release. Use `np.inf` instead. ```

To Reproduce

Expected behaviour I can train cellfinder through napari

Log file

\

Screenshots

\

Computer used (please complete the following information):

Additional context

I can make this go away by pip install "numpy<2"

IgorTatarnikov commented 1 month ago

Should we pin to NumPy < 2.0 for now?

alessandrofelder commented 1 month ago

yes, a PR is in progress - will ask for your review shortly :grin:

IgorTatarnikov commented 4 weeks ago

This is now fixed in https://github.com/keras-team/keras/pull/20049 and released as part of 3.5.0. I tested it locally and training proceeds without errors with numpy==2.0.1 and keras==3.5.0. We can now unpin numpy, but perhaps pin keras>=3.5.0?

IgorTatarnikov commented 3 weeks ago

Need to wait for torch 2.4.1 to unpin numpy Windows.