JamesHWade / gpttools

gpttools extends gptstudio for package development to help you document code, write tests, or even explain code
https://jameshwade.github.io/gpttools/
Other
291 stars 27 forks source link

Limited number of characters printed with add roxygen skeletton ? #10

Closed GillesSanMartin closed 1 year ago

GillesSanMartin commented 1 year ago

I tested "add roxygen to function" with gpttools on a function starting like that :

blastn <- function(db, # reference database in fasta format
                   query, # sequences to blast in fasta format
                   out = "Blast_output.tsv",
                   outfmt = "6 qacc saccver evalue bitscore length pident",
                   retrieve_nonmatching = TRUE, #
                   num_threads = detectCores(),
                   max_target_seqs = 20,
                   evalue = 10,
                   verbose = 0,  # 0 for no messages, 1 for time used, 2 for blast DB creation messages
                   clean = FALSE # remove the files created (blast db based on a fasta file)
){
(...)
}

This is the result obtained :

#' Blast nucleotide sequences
#' 
#' @param db reference database in fasta format
#' @param query sequences to blast in fasta format
#' @param out output file
#' @param outfmt output format
#' @param retrieve_nonmatching if true, retrieve non matching sequences
#' @param num_threads number of threads used
#' @param max_target_seqs maximum number of target

So it is very short and it does not even provide an entry for all parameters. If I request the same task but directly in chatGPT I obtain a much in depth reply with values for all parameters :

#' Blast nucleotide sequences
#'
#' This function allows to run a blastn search on a reference database. If the reference database is a fasta file, the function will create a blast database. 
#' If `retrieve_nonmatching` is set to `TRUE`, the function will retrieve non-matching sequences from the query file and add them to the blast output.
#' Optionally, the function can remove the files created by the blast search (blast database based on a fasta file).
#'
#' @param db a character string giving the path to the reference database in fasta format.
#' @param query a character string giving the path to the sequences to blast in fasta format.
#' @param out a character string giving the path to the output file. Default is "Blast_output.tsv".
#' @param outfmt a character string giving the format of the output. Default is "6 qacc saccver evalue bitscore length pident".
#' @param retrieve_nonmatching a logical indicating whether to retrieve non-matching sequences from the query file and add them to the blast output. Default is `TRUE`.
#' @param num_threads an integer giving the number of threads to use. Default is the number of cores detected on the machine.
#' @param max_target_seqs an integer giving the maximum number of target sequences to report for each query. Default is 20.
#' @param evalue a numeric giving the maximum expectation value to report. Default is 10.
#' @param verbose an integer indicating the level of messages to print. 0 for no messages, 1 for time used, 2 for blast DB creation messages. Default is 0.
#' @param clean a logical indicating whether to remove the files created by the blast search (blast database based on a fasta file). Default is `FALSE`.
#' @return invisible NULL.
#' @export
#' @examples
#' blastn(db = "reference.fasta", query = "sequences.fasta", out = "blast_output.tsv")

This is a much better starting point to document the function. I tested with several other large functions and I always get what looks like truncated answers from gpttools.

Is there a parameter to adjust to make the IA more talkative ?

JamesHWade commented 1 year ago

I've been stingy with use of tokens in most functions to keep cost down, but I add some user customization. ChatGPT has proven to be just an overall better model for this sort of thing. Once GPT-4 comes out, you should see better responses from the addins.

Thank you for sharing these examples!

JamesHWade commented 1 year ago

12 increases the default max tokens and instructs the user on how to change it. I hope this helps the incomplete response issues!