bigbio / pmultiqc

A library for QC report based on MultiQC framework
GNU General Public License v3.0
13 stars 8 forks source link

modify and simplify the code #52

Closed WangHong007 closed 2 years ago

jpfeuffer commented 2 years ago

Much better but can we completely refactor this into a class?

class Histogram:
  breaks : set(int|float)
  bin_count : list(int)
  description : str

  def setBreaks (list|set)
    ...
  def addValue(int|float):
    #use https://docs.python.org/3/library/bisect.html to look up index in breaks
    #increase bin count at index
    ...
  def getValues () -> list(int):
    return bin_count
  def getNrBins () -> int:
    return len (bin_count)
  def getBinNames() -> list(str):
    #"prevbreak-nextbreak"
    #last bin will be ">last break"
    ...
  def to_json_dict() -> dict:
    ...
WangHong007 commented 2 years ago

Hi Julianus! @jpfeuffer You mean encapsulate all drawing functions into one class or just include drawing histograms? I will continue to modify it in this PR.

jpfeuffer commented 2 years ago

Hi! I think histograms are fine for now. Also the plotting is done by multiQC you just need the conversion to a dict.

jpfeuffer commented 2 years ago

Which other plotting functions were you thinking about? As soon as they share logic, it may make sense to refactor them.

jpfeuffer commented 2 years ago

Hi, thank you! But unfortunately this is not exactly what I was thinking of. The histogram class is responsible for representing the internals of a histogram.

It will keep track of the counts and the bins. You will create one Histogram instance for each plot. E.g. instead of all the self.metric_distribution members.

Then you add values to the histogram in the parsing function.

Then you use the Histogram.to_dict function in the plot functions to get out the values and labels as dict for plotting.

jpfeuffer commented 2 years ago

For example:

self.pep_per_protein = Histogram()
self.pep_per_protein.setBreaks([100,200,500,1000])

self.pep_per_protein.description = "peptides per protein"

def parseFile:
   ...
   self.pep_per_protein.addValue(125)

def plot_pep_per_protein:
   ...
   plotDict={xyz: "..."}
   plotDict.update(self.pep_per_protein.description.to_dict())
WangHong007 commented 2 years ago

@jpfeuffer Thanks for your patience and comments, I will try it.

jpfeuffer commented 2 years ago

Wow very detailed! Thank you! One more thing I would like to ask: Can you restructure the docstrings to follow a certain docstrings format? I would suggest ReStructuredText: https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html

ypriverol commented 2 years ago

ping @jpfeuffer

jpfeuffer commented 2 years ago

Question: Did you test somehow that the output is the same?

ypriverol commented 2 years ago

Good point, it is probably good @WangHong007 to add a test that output the files and we can download them manually abd check them