MannLabs / alphabase

Infrastructure of AlphaX ecosystem
https://alphabase.readthedocs.io
Apache License 2.0
31 stars 9 forks source link

FIX match only dense dataframes #145

Closed GeorgWa closed 8 months ago

GeorgWa commented 8 months ago

SpecLibBase.available_fragment_dfs() would return ['_fragment_df', '_fragment_intensity_df', '_fragment_mz_df'] on a SpecLibFlat and fail on .remove_unused_fragments().

This makes sure it's really matching _fragment_[attribute_name]_df.

jalew188 commented 8 months ago

Looks nice

GeorgWa commented 8 months ago

I discussed the topic a bit with @mo-sameh and we reasoned that there is currently some ambiguity between a SpecLibBase and SpecLibFlat. As the SpecLibFlat is inherited and produced by a SpecLibBase it can still have dense representations (fragment_mz_df, fragment_intensity_df etc.) and inherited functions like remove_unused_fragments which don't operate on the flat representation. This is confusing as one would expect they affect your flat library.

Therefore I made some small changes to this PR:

  1. Renaming available_fragment_dfs to available_dense_fragment_dfs to make clear this only returns dense fragments matrices.
  2. Returning an empty list on SpecLibFlat.available_dense_fragment_dfs to make clear that dense representations are not supported as part of a SpecLibFlat.
  3. Get rid of dense representations when SpecLibBase.parse_base_library is called. If keep_original_frag_dfs is set to true, a deprecation warning is issued.
  4. Raise a not implemented error if SpecLibFlat.remove_unused_fragments is called.

I think to handle complexity we have to keep the different library types better separated.

Looking forward to hear your thoughts

review-notebook-app[bot] commented 8 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

jalew188 commented 8 months ago

We should use SpecLibBase as the composite instead of the parent class.

mo-sameh commented 8 months ago

The save/load_hdf functions in flat.py are still using the dense DataFrames. I agree that there's a need for refactoring the class structures and their relationships. Apart from that, for this PR everything looks good to me.

GeorgWa commented 8 months ago

I agree, or we should have a base class which only handles precursor operations and have this as parent class for SpecLibBase and SpecLibFlat. I think this would be a separate bigger refactoring involving Magnus.

jalew188 commented 8 months ago

Yes, let's keep as it is now. You can merge I think