MannLabs / alphabase

Infrastructure of AlphaX ecosystem
https://alphabase.readthedocs.io
Apache License 2.0
31 stars 9 forks source link

Docs for main functions and classes in AlphaBase #20

Closed jalew188 closed 2 years ago

jalew188 commented 2 years ago

After this PR, we may not have big PRs for alphabase unless we have more common features to add, for example: fasta/protein processing, and Sander's AlphaUtils/Performance codes.

So no hurries to review this PR, I will merge this PR until we want to officially release AlphaBase @swillems @straussmaximilian

straussmaximilian commented 2 years ago

Hi, overall docstrings improved quite a bit.

I think we can still improve on the Notebooks, e.g. with a Small header on each notebook that explains what can be found there. E.g. I would not for what calc_precursor_isotope is being used. Why is it only giving two intensities? Maybe we could start drawing a graphical representation of the classes/ functions of the public methods as sort of an API documentation.

Test consitency: There is often a mix and match for the tests, sometimes it is within a def test_ function, sometimes it is just asserts in a hdiden function.

General other comments (maybe out of the scope for this PR) to consider?

Perspective: In principle we have most of the fragment calculation in there. When would be a good time to replace the respective codebase in AlphaPept for this?

jalew188 commented 2 years ago

Hi, overall docstrings improved quite a bit.

I think we can still improve on the Notebooks, e.g. with a Small header on each notebook that explains what can be found there. E.g. I would not for what calc_precursor_isotope is being used. Why is it only giving two intensities? Maybe we could start drawing a graphical representation of the classes/ functions of the public methods as sort of an API documentation.

Thank you for quick reviewing. There may be more docs in notebooks before nbdev_build_docs. For calc_precursor_isotope, Sander asked to add isotope 1 into the DIA library, so I think give him 2 isotopes will be more convenient for him. It makes sense as sometimes, M1 or even M2 is more abundance than M0 (mono).

I did not think too much on this API, we can of course provide more isotopes.

Test consitency: There is often a mix and match for the tests, sometimes it is within a def test_ function, sometimes it is just asserts in a hdiden function.

Yes, legacy problems. After writing some test functions, I thought if we use nbs, we don't need test functions...

General other comments (maybe out of the scope for this PR) to consider?

  • For some methods I am not sure how they will scale, e.g. what if I have a very large FASTA, maybe we can have maximum sizes?

Good points. Generating Fasta library online requires computational powers, maybe it is better to use pre-build libraries for users. It not easy to control the size, for example, if we add phospho into a library, the library will be nearly 3-10 times larger than unmodified libraries. And the size is not sure for other PTMs...

  • When using high-level objects such as precursor_df, we could raise errors for the API functions when they do not have respective attributes. This would help a lot calling them.

Yes, as precursor_df's life time is as long as the software execution time, it must be checked before we use it. I am still not sure where is the best place to do the checks. But we can always catch the 'KeyError' or 'AttributeError' raised by pandas when using it.

Perspective: In principle we have most of the fragment calculation in there. When would be a good time to replace the respective codebase in AlphaPept for this?

Could do very soon:)