MichaelisTrofficus / gpt4docstrings

Generating Python docstrings with OpenAI ChatGPT!!
https://gpt4docstrings.readthedocs.io
MIT License
116 stars 11 forks source link

merge existing docstring with new generated docstring #31

Open uprokevin opened 1 year ago

uprokevin commented 1 year ago

Thanks for great library

Instead of replacing existin docstring, please do this:

docnew =.doc_gpt4 + "\n" + doc_existing

So, user can decide after to keep his own docstring info. We often put valuable infos in docstring: code sample.

uprokevin commented 1 year ago

Code to update is here

https://github.com/MichaelisTrofficus/gpt4docstrings/blob/master/src/gpt4docstrings/generate_docstrings.py#L146

Need to fetch existing docstring: add some basic to merge it: New Docstring <> Existing ones. (string comparison)

MichaelisTrofficus commented 1 year ago

Hi! Thanks for the suggestion :)

My idea was to provide two options:

  1. Don't update the docstrings: In this case gpt4docstrings will only generate docstrings for functions / classes without docstrings.
  2. Update the docstrings: Generate docstrings for functions / classes even if they already have.

But what you are saying seems reasonable to me. What I think I'll do is to create another configuration:

  1. Merge / Concatenate the docstrings: a third option in case you wannt to retain your own docstrings.
arita37 commented 1 year ago

Yes, having different updating mode is good:

update_mode = None ## only new docstring
update_mode = "overtwrite" ## Dangerous: overwrite all docstring...
update_mode = "append" ## Merge Old Docstring at bottom of new generated.
update_mode = "merge" ## Merge New and Old in a smart way... 
arita37 commented 1 year ago

You can think of adding docstring in AN existing codebase....

arita37 commented 1 year ago

@MichaelisTrofficus :

Any thoughts on implementing this ?

MichaelisTrofficus commented 1 year ago

Hello! Sorry for the delay, I've been working a lot for other projects during summer and I couldnt put any time to this project. But I'm back so I'll be implementing these features and also additional ones (translating between docstrings styles e.g. from numpydocs into google).

arita37 commented 1 year ago

Thanks ! Because of of the python code have already some docstring, so merging wuth existing will help not to lose information….

On Oct 9, 2023, at 18:40, Miguel Otero Pedrido @.***> wrote:

 Hello! Sorry for the delay, I've been working a lot for other projects during summer and I couldnt put any time to this project. But I'm back so I'll be implementing these features and also additional ones (translating between docstrings styles e.g. from numpydocs into google).

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

MichaelisTrofficus commented 1 year ago

Hi @arita37! I don't know if you've been keeping track of the new versions of the library, but there have been a lot hahah Right now, you have two options when generating docstrings.

The first option is to update the file in place (overwriting the file basically). This option will only generate docstrings when no docstrings are found for the class / function in hand.

The second option does not overwrite the file, but it creates a git patch, with all the proposed docstrings. I have added detailed documentation in the README.md if you want to take a look at it.

My idea is that, by having this patch file, the developer will be able to decide which changes he wants to add to the original file.

(Btw, there's also another experimental feature, which let's you translate between docstring styles, e.g. from numpy docstrings into google docstrings; I've also added documentation about it in the Example section of the README.md)

Let me know what you think 👍

arita37 commented 1 year ago

Hello,

Thanks for it.

For my part, will not use it since it increas my workfload… goal of the tool is automate as much as possible manual steps.

In reality, what happens is:

Developer has already codebase with in-complete docstring (or some arguments are missing).

Better to directly add GPT docstring as append to existing one.

Think appending should not be too difficult…(?)

On Oct 30, 2023, at 17:44, Miguel Otero Pedrido @.***> wrote:

 Hi @arita37! I don't know if you've been keeping track of the new versions of the library, but there have been a lot hahah Right now, you have two options when generating docstrings.

The first option is to update the file in place (overwriting the file basically). This option will only generate docstrings when no docstrings are found for the class / function in hand.

The second option does not overwrite the file, but it creates a git patch, with all the proposed docstrings. I have added detailed documentation in the README.md if you want to take a look at it.

My idea is that, by having this patch file, the developer will be able to decide which changes he wants to add to the original file.

(Btw, there's also another experimental feature, which let's you translate between docstring styles, e.g. from numpy docstrings into google docstrings; I've also added documentation about it in the Example section of the README.md)

Let me know what you think 👍

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

MichaelisTrofficus commented 1 year ago

But then you'll also have to reformat the docstring right? I mean, if you have an incomplete docstring and I append the new docstring to the existing one, you'll still need to reformat the docstring. The other option is to send the funtion with your incomplete docstring and use that in the prompt. Could you send me an example pls? Bc I think that's the way to go. Take your already in place information and enrich it with gpt4docstrings.

arita37 commented 1 year ago

Correcting append in same file/repo is faster than managing git merge on 2 repo (one original and one with new docstring) !

On Oct 30, 2023, at 20:10, Miguel Otero Pedrido @.***> wrote:

 But then you'll also have to reformat the docstring right? I mean, if you have an incomplete docstring and I append the new docstring to the existing one, you'll still need to reformat the docstring. The other option is to send the funtion with your incomplete docstring and use that in the prompt. Could you send me an example pls? Bc I think that's the way to go. Take your already in place information and enrich it with gpt4docstrings.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

arita37 commented 1 year ago

We can add small text similarity to prevent near-duplicate appending: (threshold based):

https://pypi.org/project/textdistance/

On Oct 30, 2023, at 20:10, Miguel Otero Pedrido @.***> wrote:

 But then you'll also have to reformat the docstring right? I mean, if you have an incomplete docstring and I append the new docstring to the existing one, you'll still need to reformat the docstring. The other option is to send the funtion with your incomplete docstring and use that in the prompt. Could you send me an example pls? Bc I think that's the way to go. Take your already in place information and enrich it with gpt4docstrings.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

MichaelisTrofficus commented 1 year ago

I was also thinking about relying on gpt-3.5-turbo for this. For example, suppose I have this function:

def dummy(a, b):
    """
    This function sums two integers. It will also raise
    MyCustomException if `a` is bigger than 100.
    """
    return a + b

If I ask gpt-3.5-turbo to create numpy docstrings for this function, it will already take into consideration the provided information. In this case, I'll get this:

def dummy(a, b):
    """
    Sum two integers.

    This function takes two integer inputs `a` and `b` and returns their sum. It also includes exception handling
    to raise `MyCustomException` if `a` is greater than 100.

    Parameters
    ----------
    a : int
        The first integer to be added.

    b : int
        The second integer to be added.

    Returns
    -------
    int
        The sum of `a` and `b.

    Raises
    ------
    MyCustomException
        If `a` is greater than 100.

    Examples
    --------
    >>> dummy(10, 20)
    30

    >>> dummy(110, 5)
    Traceback (most recent call last):
        ...
    MyCustomException: 'a' is greater than 100
    """
    if a > 100:
        raise MyCustomException("'a' is greater than 100")
    return a + b
arita37 commented 1 year ago

Ok, makes sense to ask GPt3.5 to include existing docstring.

Think we need to customize the prompt to make explicit integration.

In that way, user manual task is limited.

Just need to confirm If user wants to integrate existing docsting.

On Oct 30, 2023, at 20:25, Miguel Otero Pedrido @.***> wrote:

 I was also thinking about relying on gpt-3.5-turbo for this. For example, suppose I have this function:

def dummy(a, b): """ This function sums two integers. It will also raise MyCustomException if a is bigger than 100. """ return a + b If I ask gpt-3.5-turbo to create numpy docstrings for this function, it will already take into consideration the provided information. In this case, I'll get this:

def dummy(a, b): """ Sum two integers.

This function takes two integer inputs `a` and `b` and returns their sum. It also includes exception handling
to raise `MyCustomException` if `a` is greater than 100.

Parameters
----------
a : int
    The first integer to be added.

b : int
    The second integer to be added.

Returns
-------
int
    The sum of `a` and `b.

Raises
------
MyCustomException
    If `a` is greater than 100.

Examples
--------
>>> dummy(10, 20)
30

>>> dummy(110, 5)
Traceback (most recent call last):
    ...
MyCustomException: 'a' is greater than 100
"""
if a > 100:
    raise MyCustomException("'a' is greater than 100")
return a + b

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

Naggafin commented 2 months ago

Any movement on this? I'd also like to see the feature of adding the existing docstring as input for the GPT model to factor into its final output.