MannLabs / alphapeptdeep

Deep learning framework for proteomics
Apache License 2.0
108 stars 20 forks source link

ModelManager.predict_all() modifies dataframe in place #163

Open GeorgWa opened 5 months ago

GeorgWa commented 5 months ago

Describe the bug This is not really a bug, rather unexpected behaviour. The use case is a filtered spectral library which does not have the expected precursor & fragment order. It would be good to know, what the expected precursor and fragment order is.

To Reproduce

speclib = SpecLibBase()
speclib.precursor_df = pd.DataFrame([
    {'sequence': 'PEPTIDEK', 'charge':2, 'mods': '', 'mod_sites': ''},
    {'sequence': 'MYCMENK', 'charge':2, 'mods': '', 'mod_sites': ''},
    {'sequence': 'IDEK', 'charge':3, 'mods': '', 'mod_sites': ''},
    {'sequence': 'PELLPTIDEK', 'charge':3, 'mods': '', 'mod_sites': ''},
])
speclib.calc_fragment_mz_df()

speclib.precursor_df = speclib.precursor_df[speclib.precursor_df['charge']==3]

print('before: ',speclib.precursor_df['frag_start_idx'].values)
model_manager = ModelManager(
    device="mps",
)
_ = model_manager.predict_all(speclib.precursor_df, predict_items=['ms2'])
print('after: ',speclib.precursor_df['frag_start_idx'].values)

Results

before:  [ 0 16]
2024-05-07 09:55:17> Predicting MS2 ...
100%|██████████| 2/2 [00:00<00:00, 94.23it/s]
after:  [0 3]

Expected behavior as ModelManager.predict_all() returns a Dataframe it would be expected that precursor_df is not changed.

stratomaster31 commented 4 months ago

The thing is that precursor_df is sorted in-place by nAA and I assume it is for computational speed purposes.

Apart from altering the precursor_df the predictions are not returned with the same original ordering