VEuPathDB / EdaNewIssues

0 stars 0 forks source link

Mbio: handle samples with 0s for non-abundance data #600

Open asizemore opened 1 year ago

asizemore commented 1 year ago

First seen here https://github.com/VEuPathDB/microbiomeComputations/issues/31

Some samples have 0s for all vars in a collection. When the collection is abundance, we just remove the sample because that's the main reason why that sample exists! When the collection is pathway or CORRAL output, we don't want to remove the sample. Still, we need some way to handle these samples because that row of 0s should not be included in the computations. Either it could skew results or, in some cases, it will cause the whole computation to fail.

Some ideas so far:

  1. Have the compute kick out samples that have all 0s. Then return a message that says how many samples got kicked out. The frontend will display this nicely somewhere
  2. Have the workflow use hidden vars to indicate which samples it expects to have data, and then the client will secretly filter on this hidden var. Would still give the user messaging about this.

Open questions:

  1. Do we expect user datasets to ever include rows of all 0s?
  2. Should we kick these samples out of all vizs as well?
  3. Where does the frontend messaging go? Mockup time!
d-callan commented 1 year ago

option 2 has the benefit of working places other than the computes. we could update totals counts in the subsetting tab, exclude these samples from the non-compute vizs etc.

as for UDs, i think there should be some validation against this when uploading/ installing. that would be a separate issue.

asizemore commented 1 year ago

That'd be great! Who would you suggest we talk to in order to get this fleshed out? JB, jay maybe?

d-callan commented 1 year ago

Let's confirm in an mbio meeting, and talk to @jbrestel

d-callan commented 8 months ago

https://github.com/VEuPathDB/microbiomeComputations/issues/65 is a case of this. i think id like to do option 1 from above to handle that ticket, see if we get lucky and have it resolve any other open issues about failed computes, and leave this ticket open in case we later decide wed like to do option 2 as well.