IGS / gEAR

The gEAR Portal was created as a data archive and viewer for gene expression data including microarrays, bulk RNA-Seq, single-cell RNA-Seq and more.
https://umgear.org
GNU Affero General Public License v3.0
14 stars 4 forks source link

Implement ProjectR #243

Closed adkinsrs closed 1 year ago

adkinsrs commented 2 years ago

This is a continuation of https://github.com/nemoarchive/analytics/issues/104 and https://github.com/nemoarchive/analytics/issues/111

Tagging carlocolantuoni to notify him of the new ticket, and I will close the existing tickets

Screen Shot 2022-03-10 at 10 28 56 AM

Will make some subtasks for each of the diagrammed areas in the screenshot.

Currently a basic demo exists on umgear.org/projection.html but it only works for specific datasets. But we have the display and gene list architecture in place to integrate this into the front page/gene results page.

carlocolantuoni commented 2 years ago

hey joshua -

is the issue mentioned below on your plate of shaun's?

Also, rather than having this be a part of the execution when a user requests a projection we should instead do the following:

  1. Pass through all current datasets and add IDs for any datasets which don't have them.
  2. Ensure that the upload process involves dataset validation so that all have identifiers AND gene symbols at the beginning.

I've made a flow chart of this here

adkinsrs commented 2 years ago

Notes for July 20, 2022

jorvis commented 2 years ago

@carlocolantuoni - My issues to work on

carlocolantuoni commented 2 years ago

@jorvis and i covered our approach to orthologue mapping today.

@jorvis - let me know if this is your understanding of exactly what we settled on for mapping genes across species:

using the multiple species orthologue map at: https://fms.alliancegenome.org/download/ORTHOLOGY-ALLIANCE_COMBINED.tsv.gz, to map genes across species, we will:

1] use the least stringent level of evidence linking orthologues across species, i.e. if any of the many tools to map orthologues connect a pair of genes across species, we'll take it. this gives the highest number of 1-to-1 mappings across species, but does produce numerous multiple orthologue mappings as well.

2] among these multiple mappings, we will take the single orthologue mapping which has the highest number of orthologue mapping tools that give it support.

3] if there are ties among multiple mappings at this point, we will allow the user to pick one where possible. in automated settings such as projectR across species, we will simply select one of the remaining orthologues at random (@jorvis maybe we can check if one shares the same gene name and prioritize that one?). this way we will end with only 1-to-1 mappings. we decided this is better than allowing the multiple mappings because that would complicate the math of projection.

carlocolantuoni commented 2 years ago

similar to the way we are working on comparing GENE VALUES weighted gene carts in "Gene cart manager: Allow ability to compare weight distributions between patterns #403", we need to be able to compare SAMPLE VALUES, such as projected values for a specific pattern/geneCart with sample meta data, e.g. do the projected PC1 values correlate with age of samples? or the expression of a particular gene? any metric with a vale for each sample cold be compared in this way.

should this be a separate ticket?

adkinsrs commented 2 years ago

From today's standup

Ronna had a request to use projectR in a dataset that @songeric1107 is leading the charge on. Currently @songeric1107 is going to use BioMart to perform Mouse to Human mapping between source and target datasets, and will perform the PCA to have weights for the gene markers, but it was initially discussed to use unweighted gene carts to accomplish this (which was put on hold and untested). Given that cross-species gene mapping is not fully implemented yet, I may have to finagle something to get projection running for this scenario

Reference https://github.com/jorvis/gEAR/issues/1145

carlocolantuoni commented 2 years ago

i have an R function that will do the biomaRt mapping if @songeric1107 https://github.com/songeric1107 is interested. using the PC loadings as a weighted gene cart should work at this point. if we upload the mouse gene cart with the human genes ids we get from the biomaRt mapping then the projection should work as implemented already (it will just think its a human gene cart)

On Wed, Aug 10, 2022 at 12:42 PM Shaun Adkins @.***> wrote:

From today's standup

Ronna had a request to use projectR in a dataset that @songeric1107 https://github.com/songeric1107 is leading the charge on. Currently @songeric1107 https://github.com/songeric1107 is going to use BioMart to perform Mouse to Human mapping between source and target datasets, and will perform the PCA to have weights for the gene markers, but it was initially discussed to use unweighted gene carts to accomplish this (which was put on hold and untested). Given that cross-species gene mapping is not fully implemented yet, I may have to finagle something to get projection running for this scenario

— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/243#issuecomment-1210975738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7UO7IFQLLETXOTUA7TVYPLY5ANCNFSM5QNM4E6Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Carlo

adkinsrs commented 2 years ago

Discovered a bit of a caveat with running projectR. If genes from the target dataset are duplicated in the index, then it interferes with projectR running rownames(R function), which results in an intersection of 0 genes between both inputs and a bad output.

I am solving this by running adata.var_names_make_unique() on the target dataset AnnData object which renames all duplicated genes except the first. But this will result in every instance of the duplicate gene to essentially be excluded from overlapping the genes from the pattern.

https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.var_names_make_unique.html#anndata.AnnData.var_names_make_unique

carlocolantuoni commented 2 years ago

that is reasonable shaun👍

On Wed, Aug 17, 2022 at 8:38 AM Shaun Adkins @.***> wrote:

Discovered a bit of a caveat with running projectR. If genes from the target dataset are duplicated in the index, then it interferes with projectR running rownames(R function), which results in an intersection of 0 genes between both inputs and a bad output.

I am solving this by running adata.var_names_make_unique() on the target dataset AnnData object which renames all duplicated genes except the first. But this will result in every instance of the duplicate gene to essentially be excluded from overlapping the genes from the pattern.

https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.var_names_make_unique.html#anndata.AnnData.var_names_make_unique

— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/243#issuecomment-1217954940, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7QENBNCOJZ2XL5XPTDVZTMOBANCNFSM5QNM4E6Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Carlo

adkinsrs commented 2 years ago

Implemented the ability to project on unweighted gene carts. Each gene gets a "1" assigned for their weight. Once I implemented this, I also ended up refactoring some of the JSTree code that loads all the genecart patterns in order to make it flow cleaner. This is currently on devel.umgear.org for playing around with

carlocolantuoni commented 2 years ago

great!

On Wed, Aug 24, 2022 at 8:19 AM Shaun Adkins @.***> wrote:

Implemented the ability to project on unweighted gene carts. Each gene gets a "1" assigned for their weight. Once I implemented this, I also ended up refactoring some of the JSTree code that loads all the genecart patterns in order to make it flow cleaner. This is currently on devel.umgear.org for playing around with

— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/243#issuecomment-1225645384, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7ULT72X4TL3TGX4FYTV2YHNNANCNFSM5QNM4E6Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Carlo

adkinsrs commented 2 years ago
Screen Shot 2022-09-02 at 11 01 29 AM

@carlocolantuoni I have added the hover to show the number of overlapping genes between the weighted gene cart and the target dataset. However, I am not sure this is the best location to place this hover. For tSNE plots, the hover bar starts to crowd the left axis title. If I place the hover bar at the bottom, I worry it will interfere with axis labels for various plots. If I put it in the dataset title area (purple area above plot) I feel some crowding will happen there too. Ideally I'd love to have it in a single location and not have to code a per-plottype location. Another alternative would be to just have the icon hoverable with no text pre-hover, but then I worry users would not know what this does or why it is here. @jorvis , do you have any thoughts as well?

Another thought I have is just placing the hover in its own box between the dataset title and the plot, and I may explore this. Ideally it works best if all plots have the same hover, otherwise having a mix of datasets with and without the hover element will result in different container heights (which is not aesthetically pleasing)

carlocolantuoni commented 2 years ago

ya, i think the easiest solution here would be to reduce the pre-hover text to just the "!" icon, even just an "i"

carlocolantuoni commented 2 years ago

im adding info here that @adkinsrs has shared with me about where and how gene carts can be created inside gEAR/NeMO analytics - its relevant to other tickets as well, but i think its good to have here for reference in the projectR implementation. Shaun, you have completed all of these, correct?

Compare tool - saving all genes as weighted gene carts (raw foldchange, log2, or log10)
Compare tool - saving a selection of genes as unweighted gene cart
scAnalysis workbench - saving all genes from PCA as a weighted gene cart
Mulitgene curator - volcano plot - saving a selection of genes as unweighted gene cart
Mulitgene curator - volcano plot - saving all genes as weighted gene cart 
multigene curator - save a weighted gene cart from the quadrant plot

this is so great shaun, this will really allow users to leverage all NeMOs tools for diverse projection applications!

adkinsrs commented 2 years ago
Screen Shot 2022-09-08 at 3 52 01 PM Screen Shot 2022-09-08 at 3 52 17 PM

This is my new proposed design. I made a "badge" out of the info button, and when you hover it extends for the full message. It does overhang into the next plot but I think if the person is attempting to view this info, they are more focused on the hover info than the plot next to it.

jorvis commented 2 years ago

I think that's better.

carlocolantuoni commented 2 years ago

Ya thats great

On Thu, Sep 8, 2022, 15:55 Joshua Orvis @.***> wrote:

I think that's better.

— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/243#issuecomment-1241167282, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7UBOPRD66UQYWOZTG3V5JACVANCNFSM5QNM4E6Q . You are receiving this because you were mentioned.Message ID: @.***>

adkinsrs commented 2 years ago
Screen Shot 2022-09-09 at 2 28 49 PM

Changed the hover to wrap in the container

carlocolantuoni commented 2 years ago

perfect!

On Fri, Sep 9, 2022 at 2:29 PM Shaun Adkins @.***> wrote:

[image: Screen Shot 2022-09-09 at 2 28 49 PM] https://user-images.githubusercontent.com/5665914/189419536-41455f68-77fa-4301-bb2d-b87192208161.png

Changed the hover to wrap in the container

— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/243#issuecomment-1242328949, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7RH7E7ZM4EK4BPMYO3V5N6YLANCNFSM5QNM4E6Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Carlo

adkinsrs commented 2 years ago
Screen Shot 2022-09-12 at 9 26 19 AM

Added a link to the projectR paper. Hopefully this would serve as an introduction to what the purpose of this tool is

adkinsrs commented 2 years ago

Since the main implementation is complete, I am moving this to Done in the seasonal taskboard. However, we will still add updates to this ticket (or create new tickets) and close this one once the projectR branch has been merged into main

carlocolantuoni commented 2 years ago

is the implementation along with species cross referencing ready to move to production on gEAR and NeMO?

On Tue, Sep 13, 2022 at 9:52 AM Shaun Adkins @.***> wrote:

Since the main implementation is complete, I am moving this to Done in the seasonal taskboard. However, we will still add updates to this ticket (or create new tickets) and close this one once the projectR branch has been merged into main

— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/243#issuecomment-1245446095, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7SIVGPSDJGFDPDSEKDV6CBKLANCNFSM5QNM4E6Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Carlo

adkinsrs commented 2 years ago

Personally I think the implementation is ready to be moved over. However, the cross-referencing is not ready, and when that is handed to me I will have to modify projectR code, so we might as well not merge it into the "main" branch yet.

My previous comment was just referencing moving the ticket in the current "project" sprint to Done from In Progress, since the only things left are minor things that can be placed into new tasks or subtasks

jorvis commented 2 years ago

@carlocolantuoni I just this morning got past all the data issues for mapping and generated the full set of orthology maps. Now we have to start working on the software updates to actually read them.

carlocolantuoni commented 2 years ago

great guys! how much/what software updating is necessary for this?

On Tue, Sep 13, 2022 at 11:07 AM Joshua Orvis @.***> wrote:

@carlocolantuoni https://github.com/carlocolantuoni I just this morning got past all the data issues for mapping and generated the full set of orthology maps. Now we have to start working on the software updates to actually read them.

— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/243#issuecomment-1245552090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7U3TWAMDY7HU63PYMDV6CKDBANCNFSM5QNM4E6Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Carlo

adkinsrs commented 2 years ago

Suggestion:

Implement a lock file in case multiple people run projectR on the same dataset/pattern simultaneously (like in a workshop

adkinsrs commented 2 years ago

Suggestion:

Implement a queue system so running projectR is staggered and not going to crash the server

carlocolantuoni commented 2 years ago

also as discussed on email:

we will expand where we can get gene carts from and what we can do with them - e.g. already getting them from the compare tool and PCA is great. can we draw gene carts from the volcano plot in the multi gene view? where else does gEAR/NeMO perform analyses that would be useful for this? can we associate a gene cart with a particular dataset (e.g. PCA of dataset X with dataset X)?

further, performing simple mathematical operations/transformation on gene carts and visualizing gene carts will be extremely important in understanding and using gene carts, e.g. plot one against another or log transform a weighted gene cart.

is there a gene cart manager ticket i should move these points to?

On Wed, Jul 13, 2022 at 2:21 PM Carlo Colantuoni @.***> wrote:

as discussed on email: lets also implement a way to project simple, unweighted gene carts. this would be a simply add-on to the R script we currently use for projection.

On Wed, Jul 13, 2022 at 2:14 PM Carlo Colantuoni < @.***> wrote:

totally agree - that flow looks perfect the specific problem we are hitting with these datasets is due to my own omission of ensembl IDs in the more distant past when i was using only gene symbols in the upload. but of course as there was not a check in the past, many other datasets could be similar. so going thru all current datasets and checking new ones would be perfect to make sure everything works

On Wed, Jul 13, 2022 at 11:45 AM Joshua Orvis @.***> wrote:

Also, rather than having this be a part of the execution when a user requests a projection we should instead do the following:

  1. Pass through all current datasets and add IDs for any datasets which don't have them.
  2. Ensure that the upload process involves dataset validation so that all have identifiers AND gene symbols at the beginning.

I've made a flow chart of this here https://docs.google.com/presentation/d/14EHyXnY9GEOjSKr-hz4MbnL4VJDQgb_3rEiGKotQcl0/edit?usp=sharing

— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/243#issuecomment-1183387332, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7V2U7NIOVEWTMCIUYDVT3QBTANCNFSM5QNM4E6Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Carlo

-- Carlo

-- Carlo

jorvis commented 2 years ago

No, it's more appropriate to create individual tickets for new feature requests.

carlocolantuoni commented 2 years ago

@jorvis, dont know why it is listing my last comment as 7 days ago - it was a good while ago - you recommended the same at that time and in response i created:

https://github.com/IGS/gEAR/issues/403 - Gene cart manager: Allow ability to compare weight distributions between patterns

https://github.com/IGS/gEAR/issues/405 - Gene cart manager: enable simple math and transformations on weighted gene carts

carlocolantuoni commented 2 years ago

actually looks like u opened 1 and i opened 1, back in august

carlocolantuoni commented 1 year ago

is it not a bit early to close this out? maybe open til its running on production?

jorvis commented 1 year ago

We are merging it into production tomorrow.

On Mon, Jan 30, 2023 at 9:57 PM Carlo Colantuoni @.***> wrote:

is it not a bit early to close this out? maybe open til its running on production?

— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/243#issuecomment-1409716472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQZE37EVR44WMIUVEIIUDWVCESLANCNFSM5QNM4E6Q . You are receiving this because you were mentioned.Message ID: @.***>

carlocolantuoni commented 1 year ago

awesome! hope it goes well, c u wed

On Mon, Jan 30, 2023 at 11:02 PM Joshua Orvis @.***> wrote:

We are merging it into production tomorrow.

On Mon, Jan 30, 2023 at 9:57 PM Carlo Colantuoni @.***> wrote:

is it not a bit early to close this out? maybe open til its running on production?

— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/243#issuecomment-1409716472, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AACQZE37EVR44WMIUVEIIUDWVCESLANCNFSM5QNM4E6Q

. You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/243#issuecomment-1409719900, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7Q2EZDCGFVEX5Q22FTWVCFHFANCNFSM5QNM4E6Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Carlo