BioGenies / tidysq

tidy processing of biological sequences in R
https://BioGenies.github.io/tidysq/
34 stars 2 forks source link

Thoughts on PepTools #4

Open leonjessen opened 5 years ago

leonjessen commented 5 years ago

So, my original thought with PepTools, was a small super light weight, non-dependent (I.e. only base code) toolbox for working with peptide data (which is what we do in the group). E.g.

At the same time, I wanted to use it in my teaching ("Immunological Bioinformatics" and "R for Bio Data Science")

Some of the functions would be simple wrappers, primarily to match the terminology of bioinformatics, e.g.

PepTools2::pep_split
function(pep){
  # Check input
  pep_check(pep = pep)
  # Convert to matrix
  # do.call applies a function to the list returned from args
  # so rbind to form matrix each of the elements in the list returned
  # by strsplit
  return( do.call(what = rbind, args = strsplit(x = pep, split = '')) )
}

and then also include standard data, like the PepTools2::BLOSUM62 and PepTools2::BLOSUM50, natural background frequencies PepTools2::BGFREQS and example peptides PepTools2::PEPTIDES. Furthermore, the ggseqlogo package is quite nice, but it only support simple shannon entropy based logos, which is sub-optimal compared to Kullback-Leibler logos. So basically, I wanted to extend with the ability to compute PSSMs to match the functionality of Seq2Logo, these matrices could then be visualised using the custom functionality of ggseqlogo. Lastly, my intention was to name all functions using the prefix pep_

Thinking about it, perhaps, we should make the PepTools package as a separate package, but still as a sub-part of tidysq? A bit like ggplot2 is a part of tidyverse?

I'm interested in your thoughts? 👍

michbur commented 5 years ago

I think all these functions fit in the scope of tidysq (mutate data from biological sequences into features understandable for ML). Only sequence logos seem to be slightly off the scope, but in the end, they are relevant to for example presentation of significant features (motifs). Right now, during the phase of intense development, I would rather keep everything in one bin. After your holiday we might see a need to split the functionality between different packages.