Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

Evan readme improvement #143

Closed Martinsos closed 4 years ago

evanbiederstedt commented 4 years ago

Hi @Martinsos

Thanks for doing this, and I'm honored by the branch name! :)

This looks good. The sole thing I would add would be to clarify within the README what the arguments in the function are used for---I think that was my original motivation for the PR.

So, currently the README shows this:

align()

align(query, target, [mode], [task], [k], [additionalEqualities])
Aligns query against target with edit distance.

query and target can be strings, bytes, or any iterables of hashable objects, as long as all together they don't have more than 256 unique values.

Above, in "Features", you explain the features....but there's not a clear connection for the developer in a hurry (i.e. all of us) as to how to use the parameters.

In my branch here https://github.com/Martinsos/edlib/pull/139, I think I just pasted over the information from the python scripts into a bulleted list, i.e.

* ``mode`` --- (Default ``"NW"``, optional) Alignment method do be used. 
   - ``"NW"`` for global (default)
   - ``"HW"`` for infix
   - ``"SHW"`` for prefix
 * ``task`` --- (Default ``"distance"``, optional) Tells edlib what to calculate. The less there is to calculate, the faster it is. Possible value are (from fastest to slowest):
   - ``"distance"`` - find edit distance and end locations in target. Default.
   - ``"locations"`` - find edit distance, end locations and start locations.
   - ``"path"`` - find edit distance, start and end locations and alignment path.
 * ``k`` -- (Default ``-1``, optional) Max edit distance to search for - the lower this value, the faster is calculation. Set to -1 (default) to have no limit on edit distance.
 * ``additionalEqualities`` --- (Default ``None``, optional) List of pairs of characters or hashable objects, where each pair defines two values as equal. This way you can extend edlib's definition of equality (which is that each character is equal only to itself).This can be useful e.g. when you want edlib to be case insensitive, or if you want certain characters to act as a wildcards. Set to None (default) if you do not want to extend edlib's default equality definition.

Somehow putting that information in the README for ease-of-use is the challenge here.

Martinsos commented 4 years ago

Makes sense, @evanbiederstedt but isn't output of help(edlib.align) good enough? Maybe you just didn't see that it is being included in the README right now? I am not sure at which document are you looking at right now, is it README-tmpl.rst here in the PR, or is it README on PyPI? Because what I did is, I made it so that README.rst is generated from the README-tmpl.rst, and it also includes help() outputs for each function. So check out README on PyPI (I updated it) and check out README-tmpl.rst here in PR to see how it is done and let me know if that is all right, or you meant something on top of this.

evanbiederstedt commented 4 years ago

Hi @Martinsos

You're right, I was just looking at the README-tmpl.rst in order to figure this out.

And yes, this is a good point---help(edlib.align) certainly corrects my confusion...

Help on built-in function align in module edlib:

align(...)
    Align query with target using edit distance.
    @param {str or bytes or iterable of hashable objects} query, combined with target must have no more
           than 256 unique values
    @param {str or bytes or iterable of hashable objects} target, combined with query must have no more
           than 256 unique values
    @param {string} mode  Optional. Alignment method do be used. Possible values are:
            - 'NW' for global (default)
            - 'HW' for infix
            - 'SHW' for prefix.
    @param {string} task  Optional. Tells edlib what to calculate. The less there is to calculate,
            the faster it is. Possible value are (from fastest to slowest):
            - 'distance' - find edit distance and end locations in target. Default.
            - 'locations' - find edit distance, end locations and start locations.
            - 'path' - find edit distance, start and end locations and alignment path.
    @param {int} k  Optional. Max edit distance to search for - the lower this value,
            the faster is calculation. Set to -1 (default) to have no limit on edit distance.
    @param {list} additionalEqualities  Optional.
            List of pairs of characters or hashable objects, where each pair defines two values as equal.
            This way you can extend edlib's definition of equality (which is that each character is equal only
            to itself).
            This can be useful e.g. when you want edlib to be case insensitive, or if you want certain
            characters to act as a wildcards.
            Set to None (default) if you do not want to extend edlib's default equality definition.
    @return Dictionary with following fields:
            {int} editDistance  Integer, -1 if it is larger than k.
            {int} alphabetLength Integer, length of unique characters in 'query' and 'target'
            {[(int, int)]} locations  List of locations, in format [(start, end)].
            {string} cigar  Cigar is a standard format for alignment path.
                Here we are using extended cigar format, which uses following symbols:
                Match: '=', Insertion to target: 'I', Deletion from target: 'D', Mismatch: 'X'.
                e.g. cigar of "5=1X1=1I" means "5 matches, 1 mismatch, 1 match, 1 insertion (to target)".

Ok, as a user who goes to the github repo to figure things out on the fly, here are two additions which might make things easier (feel free to disagree!):

--- Maybe tell the reader that help(edlib.align) works. That's not always the case with python packages, and sometimes the API e.g. align.help()

--- https://github.com/Martinsos/edlib/tree/master/bindings/python

Could the README pop up when you click here?

That should clarify any remaining confusion :)

Martinsos commented 4 years ago

Good points!

Regarding help() -> in README on pypi, it both shows output of help(edlib.align) and says it was received by running help(edlib.align), so that should be good.

Regarding the README, I did not want to duplicate it here, but I did create README.md that is aimed for developers and moved "Development" section from README.rst into README.md. They also both point to each other. Now developers have entry point with relevant info, but there is also no duplication or risk of developers editing generated files.

I think that should be it! Check the latest master and also pypi page of edlib if you would like to see these changes, and let me know if there is anything else!

evanbiederstedt commented 4 years ago

HI @Martinsos

Thanks for the help here.

Regarding help() -> in README on pypi, it both shows output of help(edlib.align) and says it was received by running help(edlib.align), so that should be good.

You're right on the pypi thing. However, I do think most developers look at the github repo as opposed to checking the README on pypi. So, my only other suggestion was to clarify this in the python README.

But I understand now what you're trying to do in "removing" the python README entirely (somewhat) from the repo. That's sensible---I was just confused as the README used to exist there.

Do you think it would be beneficial to create a small section of the python API on the "main README" here?

https://github.com/Martinsos/edlib/blob/master/README.md

I hate to be difficult---I'm just thinking to myself how I often use github repos in order to avoid confusion in a few months time :)

Martinsos commented 4 years ago

@evanbiederstedt hey that is great feedback :)! So now there is README.md in bindings/python that points to README on PyPI page and also mentions README-tmpl.rst + has information for developers, so I believe that should cover anybody coming there. And in README on PyPI we do mention that help(edlib.align) can be called + we show it's output. So the only thing left is if somebody comes directly to the main README.md which is at the root of the whole repository. That one does already mention that there is python package and it links to it, but I could make that link a little bit more prominent.

Am I getting this right, or am I missing smth that you are telling me? Is there something else that you would like to do, and what exactly? Creating section of Python API on the main README.md, in root of the project repository -> that would be duplication so I would prefer not doing that, but I can make it more obvious there is python package and link to it, would that be fine?

Martinsos commented 4 years ago

I moved sentence mentioning Python up in main README.md and also gave link to piece of repo that contains python code. I feel adding more will mess up the structure of the main README.md hm! Although I do understand that you, as python package user, want to see more of Python stuff when you come to github page. Maybe I could go with smth like: here is link to docs for python, here is link for C/C++. Uf though decisions :D! I am leaning toward leaving it like this for now and maybe experimenting more in the future.

evanbiederstedt commented 4 years ago

Maybe I could go with smth like: here is link to docs for python, here is link for C/C++. Uf though decisions

This seems too complicated, I think.

As a user, I just want concise information in the README---I think you could add a subsection in the main README about python, or linking to the PyPi README perhaps. Just something to avoid unnecessary confusion :)

evanbiederstedt commented 4 years ago

Also, I should clarify that, I think the PyPI README is great, and clarify everything. https://pypi.org/project/edlib/

Linking to this in the main README solves the confusion entirely I think---you could certainly remove the README.md from the bindings/python subdirectory.

Martinsos commented 4 years ago

Ok great that we have PyPI README being good :D!

So there is a link to PyPI from the main README right now, is that ok then? Or is it hard to see? It is that badge that says pypi ... .

README.md in bindings/python -> I would actually leave that one there because it gives the idea of "ok this is a root of a project" and it provides development information + links to PyPI docs. When I am working on python binding, I usually go there to take a look at README, to remind myself how to build the project and similar.

On Wed, Jan 22, 2020 at 2:29 AM evanbiederstedt notifications@github.com wrote:

Also, I should clarify that, I think the PyPI README is great, and clarify everything. https://pypi.org/project/edlib/

Linking to this in the main README solves the confusion entirely I think---you could certainly remove the README.md from the bindings/python subdirectory.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Martinsos/edlib/pull/143?email_source=notifications&email_token=AALXFB6M3I3TYCG4ZZLC7TTQ66OOJA5CNFSM4KH7PJD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJR43AY#issuecomment-576966019, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALXFB3TZ4O5MORJC547FHDQ66OOJANCNFSM4KH7PJDQ .

evanbiederstedt commented 4 years ago

So there is a link to PyPI from the main README right now, is that ok then? Or is it hard to see? It is that badge that says pypi ... .

It could be clearer, I think. The issue is, not all PyPI sites have such a clear README, so I rarely go there. I think even a statement like "for the README, click here" would clarify things.

README.md in bindings/python -> I would actually leave that one there because it gives the idea of "ok this is a root of a project" and it provides development information + links to PyPI docs. When I am working on python binding, I usually go there to take a look at README, to remind myself how to build the project and similar.

The only problem with this is that, when I go to bindings/python, and I see the README there...I'm confused it's not given me the information I want immediately (which is what I want from the README). That being said, the statement "This README contains only development information, you can check out full README (README.rst) for the latest version of Edlib python package on Edlib's PyPI page." clarifies things greatly!

Martinsos commented 4 years ago

Ok, step by step we are getting there :D! I added "Click here for README" to the main README, just next to the badge, as you suggested. Regarding README in bindings/python -> I get what you are saying. But I really want to avoid duplication because it makes the maintenance harder!

Ok great, I think this should be it now! Thanks for all the help :).

On Wed, Jan 22, 2020 at 1:34 PM evanbiederstedt notifications@github.com wrote:

So there is a link to PyPI from the main README right now, is that ok then? Or is it hard to see? It is that badge that says pypi ... .

It could be clearer, I think. The issue is, not all PyPI sites have such a clear README, so I rarely go there. I think even a statement like "for the README, click here" would clarify things.

README.md in bindings/python -> I would actually leave that one there because it gives the idea of "ok this is a root of a project" and it provides development information + links to PyPI docs. When I am working on python binding, I usually go there to take a look at README, to remind myself how to build the project and similar.

The only problem with this is that, when I go to bindings/python, and I see the README there...I'm confused it's not given me the information I want immediately (which is what I want from the README). That being said, the statement "This README contains only development information, you can check out full README (README.rst) for the latest version of Edlib python package on Edlib's PyPI page." clarifies things greatly!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Martinsos/edlib/pull/143?email_source=notifications&email_token=AALXFB6QD32J5TMHGPAURALQ7A4O3A5CNFSM4KH7PJD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJTMIII#issuecomment-577160225, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALXFB4OYUHNTHBIYRZLSGDQ7A4O3ANCNFSM4KH7PJDQ .

evanbiederstedt commented 4 years ago

I think this clarifies any possible remaining confusion @Martinsos

I really appreciate the help!