compomics / ms2rescore

Modular and user-friendly platform for AI-assisted rescoring of peptide identifications
https://ms2rescore.readthedocs.io
Apache License 2.0
39 stars 14 forks source link

Add user control for whether models are downloaded automatically #135

Open vrkosk opened 3 months ago

vrkosk commented 3 months ago

Currently, when MS2PIPFeatureGenerator calls ms2pip.correlate(), the latter may download xgboost_models_files from the Internet if they are not present in the model_dir directory. Can we please have a control to disable this behaviour?

It's possible to do by monkey patching. At the top of your script, before importing anything from MS2Rescore or MS2PIP, add these lines:

import urllib.request

def _raise_urllib_exception(*args, **kwargs):
    raise urllib.error.URLError("Downloading from the Internet is prevented in this build.")

urllib.request.urlretrieve = _raise_urllib_exception

from ms2rescore.feature_generators.ms2pip import MS2PIPFeatureGenerator

The code targeted is in ms2pip/_utils/xgb_models.py, def _download_model(), which uses urllib.request.urlretrieve. The trick relies on the fact that there is always only one instance of an imported module, so messing with urllib.request before ms2pip imports it guarantees that ms2pip sees the same monkey-patched instance.

Obviously, this is rather poor practice and some option passed to MS2PIPFeatureGenerator would be much nicer.

RalfG commented 3 months ago

Hi @vrkosk,

Thanks for the suggestion! We will look into adding this in MS²PIP, so a monkey patch is not required and it simply is provided as an option.

If auto-download is disabled, and the required model is missing from the directory, you would expect an exception to be raised?

Best, Ralf

vrkosk commented 3 months ago

Thank you for considering it! Any way to signal the error is fine as long as it's in the MS2PIPFeatureGenerator documentation.