IGS / gEAR

The gEAR Portal was created as a data archive and viewer for gene expression data including microarrays, bulk RNA-Seq, single-cell RNA-Seq and more.
https://umgear.org
GNU Affero General Public License v3.0
14 stars 4 forks source link

create a gene cart format that can include many named unweighted gene carts #598

Open carlocolantuoni opened 1 year ago

carlocolantuoni commented 1 year ago

create gene a cart format that can include many named unweighted gene carts, i.e. simple unweighted gene lists, such as a set or marker genes for 5 different cell types.

this could simply look like this: image

adkinsrs commented 1 year ago

I would assume that the possibility of one gene belonging to multiple labels needs to exist in this. Maybe we could call this a "labeled" or "categorical" gene cart that would be stored as a new database table (most likely) and then converted to intermediate JSON structure ({label:[gene1, gene2]}) in code

adkinsrs commented 11 months ago

Thoughts I have with respect to this

  1. (QUESTION) I assume this is going to be used for projections, right? What would you like to use these carts for beyond that?
  2. (THOUGHTS) Currently, this implementation (the image from the opening comment) does not suggest that Ensembl IDs will be provided.. To me, this means that we need to implement these like unweighted gene carts which use the Ensembl ID stored in the database.
  3. (COMMENT) I think the best way to handle this is to add the "labeled" gene cart in the gene cart table in the database, but for each label break them into unweighted gene carts and add them in to the gene cart table as well. There will be a new "parent gene cart" table (because "gene cart group" is already taken) that will like the labeled gene cart and each unweighted gene cart. The only downside I can think of is the potential for a lot of new unweighted carts to choose from in normal tasks.
  4. (THOUGHTS) We could also have some gene cart manager functionality to let you build a labeled gene cart from existing unweighted gene carts in the database. I suppose you could do this with weighted gene carts too, but they are usually not applicable.
  5. (THOUGHTS) It's worth noting that currently the plotting API calls do not have functionality to map the genes to their label (and would drop duplicated genes across labels). We would have to add some new functionality/visualization to the plots to delineate the labels (colors, shapes, etc.). Scanpy already does this with several of their plotting tools, but we would have to build our own for the plotly functions.
  6. (COMMENT) I think for ease of use, there are two ways to require the user to submit these files
    1. Like Carlo has it in the opening image. But this would pretty much restrict the user to Excel-format only since the user would have to skip tabs or commas if submitting a text file with the labels-as-headers approach
    2. Have the first column be the "label" column... one label per row. Everything after that (comma or tab-delineated) is a gene for that label. We will have to enforce not space-separated since labels could have spaces. I am in favor of this approach personally as it opens up submission types to tsv- and csv- types
carlocolantuoni commented 11 months ago

1 - yes, for projection and multigene views 2 - yes - i think it should work as unweighted gene carts currently work, with the exception that this would be several linked unweighted gene carts (1 in each column) 3 - might have to discuss this one - i dont get all the details here (likely because i dont kno the details of how gene carts are stored etc). but i do agree i dont think we want them all individually indexed - the lists of carts will grow too big. does this neccessiate a new "class" of cart for these grouped/labeld unweighted carts? 4 - makes sense to me, dont think we need this for weighted carts 5 - dont get this either - lets discuss

adkinsrs commented 1 month ago

It just occurred to me that this could be accomplished by taking the entire union of all genes, and create a weighted gene cart where the loadings are the labels and the values are a binary 1 (is in label set) or 0.

So maybe what we can do is to add a new uploader function in the gene cart manager that would take the format in the opening photo and convert it to a binary weighted gene cart.