VEuPathDB / EdaDataService

Apache License 2.0
2 stars 0 forks source link

Address compute thoughts, ideas, concerns #218

Closed ryanrdoherty closed 1 year ago

ryanrdoherty commented 1 year ago

The following comments were in the compute RAML in the EDA data service; much of that RAML has been copied/transferred to the compute service. I feel like an issue is a better place to discuss and address them.

  # compute responses. for later, but leaving the draft here
  # the compute will be baked right in to the viz plugin for now
  # still need to draft the *ComputeData objects

  # Realized there may be so many options someday, we don't want to have to list all of them here. 
  # Having parameter set at the top level should make for easier finding of computations

  ParameterSet:
    additionalProperties: false
    properties:
      parameters: 
        type: string[]

  # some questions here
  # what if there is more than one computed var? need multiple ranges?
  # - AB can we assume that if one computation returns multiple numeric vars, that they'd make a collection var so all have the same range?
  # - AB no, beta div phase dolphin will break this, because it wants to return beta div dissimilarity matrix *and* the pcoa results.
  # - AB but, should one computation return vars so different they aren't in a collection? In the beta div case, should the dissimilarity measures and pcoa results be different computations in the same app?
  # - AB phase 1 alpa, beta, abundance, correlation, and diff abund will be okay.
  # - AB but what really *is* a computation? Ideally one computation = 1 computed variable.
  # what if the computed var is categorical, edge case for now but still
  # - AB will need to return ordered vocabulary, and is ordinal, and...
  # need better name for computeName, not to be confused w name prop above?
  # what is computeName anyway?
  # two conceptual levels here ComputedVariableMetadata and ComputeMetadata
  # maybe Metadata -> ComputeMetadata and has a ComputedVariableMetadata child?
  # AB some apps may generalize to clinepi (ranked abundance). Maybe 'record' is a better name than 'SampleID' for sake of generalizability?
  # AB plot responses return the config. Should app responses also return a nice config?
  # AB don't forget about entities!

  # Proposal
  # 1. Apps are a list of computations
  # 2. Computations input parameters and output a single var or collection var. Computations have a name, unique set of parameters, computed var(s), and details on how it went.
  # 3. Computed vars are one output. Computed vars have names, displayNames, entities.
  # 4. Computed vars are either categorical or numeric, so must also carry the appropriate annotations (ex. display range, vocabulary)

  # The above means we would have the beta div app return a dissimilarity matrix and pcoa results as two computations. 
d-callan commented 1 year ago

im pretty sure all of these points have been addressed at some point along the way. im going to close this, we can open specific issues later if/ as problems arise.