Mathelab / ALTRE

ALTered Regulatory Elements
http://mathelab.github.io/ALTRE/
9 stars 8 forks source link

Adding gene names #46

Closed rfarouni closed 8 years ago

rfarouni commented 8 years ago

To add the gene names, I modified your function from

tssannotgrange <- function(grange, TSS, distancefromTSS) {
  # read in TSS, make column header, change to
  # Granges Find distance to transcription
  # start site
  distancetoTSS <- distanceToNearest(grange, TSS)
  # make dataframe from grange
  newdataframe <- grangestodataframe(grange)
  newdataframe$distance <- mcols(distancetoTSS)$distance
  newdataframe <- within(newdataframe, {
    region <- ifelse(distance <= distancefromTSS,
                     "TSS-proximal",
                     "TSS-distal")
  })
  # annotate anything <=1500 bp away as
  # TSS-proximal, otherwise TSS-distal
  chr <- c()
  annotatedgrange <- with(newdataframe,
                          GRanges(chr,
                                  IRanges(start, stop),
                                  meta = newdataframe[, 5]))
  # create a grange
  colnames(mcols(annotatedgrange)) <- c("region")
  return(annotatedgrange)
}

to

tssannotgrange <- function(grange, TSS, distancefromTSS) {

  distancetoTSS <- distanceToNearest(grange, TSS)
  mcols(grange)$region <- ifelse(mcols(distancetoTSS)$distance <= distancefromTSS,
                                 "TSS-proximal",
                                 "TSS-distal")
  mcols(grange)$gene_name <- mcols(TSS[subjectHits(distancetoTSS), ])$gene_name
  return(grange)
}

so the output is exactly the same but with an additional column (i.e. gene_name)

GRanges object with 256132 ranges and 2 metadata columns:
           seqnames               ranges strand |       region    gene_name
              <Rle>            <IRanges>  <Rle> |  <character>  <character>
       [1]     chr1     [ 10146,  10349]      * |   TSS-distal      DDX11L1
       [2]     chr1     [237719, 237910]      * | TSS-proximal   AP006222.2
       [3]     chr1     [521551, 521614]      * | TSS-proximal RP5-857K21.2
       [4]     chr1     [564454, 565042]      * | TSS-proximal     MTND2P28
       [5]     chr1     [565253, 566084]      * | TSS-proximal     MTND2P28
       ...      ...                  ...    ... .          ...          ...
  [256128]     chrY [59019677, 59020088]      * |   TSS-distal      CTBP2P1
  [256129]     chrY [59020581, 59020934]      * |   TSS-distal      CTBP2P1
  [256130]     chrY [59024187, 59024559]      * |   TSS-distal      CTBP2P1
  [256131]     chrY [59027624, 59027997]      * |   TSS-distal      CTBP2P1
  [256132]     chrY [59029576, 59030134]      * |   TSS-distal      CTBP2P1

The problem now is that the rest of the code needs to be modified since downstream functions assume that region is the last column. As a result, you get something like this

$consPeaksAnnotated
GRanges object with 195887 ranges and 4 metadata columns:
           seqnames               ranges strand |  meta.region     meta.A549   meta.SAEC     meta.NA
              <Rle>            <IRanges>  <Rle> |  <character>   <character> <character> <character>
       [1]     chr1     [ 10146,  10349]      * | TSS-proximal       DDX11L1        A549        <NA>
       [2]     chr1     [237719, 237910]      * | TSS-proximal    AP006222.2        A549        <NA>
       [3]     chr1     [521551, 521614]      * | TSS-proximal  RP5-857K21.2        <NA>        <NA>
       [4]     chr1     [564454, 570272]      * | TSS-proximal RP5-857K21.11        A549        <NA>
       [5]     chr1     [713897, 715323]      * | TSS-proximal RP11-206L10.9        A549        <NA>
       ...      ...                  ...    ... .          ...           ...         ...         ...
  [195883]     chrY [59003875, 59006774]      * |   TSS-distal       CTBP2P1        A549        <NA>
  [195884]     chrY [59011978, 59020934]      * |   TSS-distal       CTBP2P1        A549        <NA>
  [195885]     chrY [59024187, 59024559]      * |   TSS-distal       CTBP2P1        A549        <NA>
  [195886]     chrY [59027624, 59027997]      * |   TSS-distal       CTBP2P1        A549        <NA>
  [195887]     chrY [59029576, 59030134]      * |   TSS-distal       CTBP2P1        <NA>        <NA>
Mathelab commented 8 years ago

HI Rick,

The following could should keep ‘region’ last and not break the downstream code, right?


  distancetoTSS <- distanceToNearest(grange, TSS)

  mcols(grange)$gene_name <- mcols(TSS[subjectHits(distancetoTSS), ])$gene_name

mcols(grange)$region <- ifelse(mcols(distancetoTSS)$distance <= distancefromTSS, "TSS-proximal", "TSS-distal") return(grange) }

Ewy

rfarouni commented 8 years ago

Not really. This is what you get instead

$consPeaksAnnotated
GRanges object with 195887 ranges and 4 metadata columns:
           seqnames               ranges strand |   meta.region    meta.A549   meta.SAEC     meta.NA
              <Rle>            <IRanges>  <Rle> |   <character>  <character> <character> <character>
       [1]     chr1     [ 10146,  10349]      * |       DDX11L1 TSS-proximal        A549        <NA>
       [2]     chr1     [237719, 237910]      * |    AP006222.2 TSS-proximal        A549        <NA>
       [3]     chr1     [521551, 521614]      * |  RP5-857K21.2 TSS-proximal        <NA>        <NA>
       [4]     chr1     [564454, 570272]      * | RP5-857K21.11 TSS-proximal        A549        <NA>
       [5]     chr1     [713897, 715323]      * | RP11-206L10.9 TSS-proximal        A549        <NA>
       ...      ...                  ...    ... .           ...          ...         ...         ...
  [195883]     chrY [59003875, 59006774]      * |       CTBP2P1   TSS-distal        A549        <NA>
  [195884]     chrY [59011978, 59020934]      * |       CTBP2P1   TSS-distal        A549        <NA>
  [195885]     chrY [59024187, 59024559]      * |       CTBP2P1   TSS-distal        A549        <NA>
  [195886]     chrY [59027624, 59027997]      * |       CTBP2P1   TSS-distal        A549        <NA>
  [195887]     chrY [59029576, 59030134]      * |       CTBP2P1   TSS-distal        <NA>        <NA>
Mathelab commented 8 years ago

OK. I won’t be able to work on this for another couple of days. Is there a separate branch to work on this? Perhaps keep this as the last issue so the others can move along? Ewy

From: Rick Farouni notifications@github.com<mailto:notifications@github.com> Reply-To: Mathelab/ALTRE reply@reply.github.com<mailto:reply@reply.github.com> Date: Sunday, October 9, 2016 at 10:52 PM To: Mathelab/ALTRE ALTRE@noreply.github.com<mailto:ALTRE@noreply.github.com> Cc: Ewy Mathe Ewy.Mathe@osumc.edu<mailto:Ewy.Mathe@osumc.edu>, Assign assign@noreply.github.com<mailto:assign@noreply.github.com> Subject: Re: [Mathelab/ALTRE] Adding gene names (#46)

Not really. This is what you get instead

$consPeaksAnnotated GRanges object with 195887 ranges and 4 metadata columns: seqnames ranges strand | meta.region meta.A549 meta.SAEC meta.NA

| [1] chr1 [ 10146, 10349] \* | DDX11L1 TSS-proximal A549 [2] chr1 [237719, 237910] \* | AP006222.2 TSS-proximal A549 [3] chr1 [521551, 521614] \* | RP5-857K21.2 TSS-proximal [4] chr1 [564454, 570272] \* | RP5-857K21.11 TSS-proximal A549 [5] chr1 [713897, 715323] \* | RP11-206L10.9 TSS-proximal A549 ... ... ... ... . ... ... ... ... [195883] chrY [59003875, 59006774] \* | CTBP2P1 TSS-distal A549 [195884] chrY [59011978, 59020934] \* | CTBP2P1 TSS-distal A549 [195885] chrY [59024187, 59024559] \* | CTBP2P1 TSS-distal A549 [195886] chrY [59027624, 59027997] \* | CTBP2P1 TSS-distal A549 [195887] chrY [59029576, 59030134] \* | CTBP2P1 TSS-distal — You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Mathelab_ALTRE_issues_46-23issuecomment-2D252530861&d=CwMFaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=kwZD24MMCbG_sisYwGVpukmuRHYOGbXk10phc-LvGu4&m=DpSAOvBaiPDJjF5mtn-JpgPHvnDHcOjlBQxYmp4b0VY&s=5iKI-pvAPVIbpoz572qYhg11sWApaEvOJei23JCEfRg&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOpx3Tq1Yr5OqzhnZNVfdcFpcipO-5Fdykks5qyahlgaJpZM4KRw0T&d=CwMFaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=kwZD24MMCbG_sisYwGVpukmuRHYOGbXk10phc-LvGu4&m=DpSAOvBaiPDJjF5mtn-JpgPHvnDHcOjlBQxYmp4b0VY&s=A99EusnNVcS17bjjQQujNMkzKrbpkZqA4aaoZa3XOfk&e=.
osubmi784323 commented 8 years ago

I've mostly fixed this. Where should I push it when I am done? Make a new branch?

Mathelab commented 8 years ago

Thank you. Yes, safest to make a branch. Ewy

From: baskineliz notifications@github.com<mailto:notifications@github.com> Reply-To: Mathelab/ALTRE reply@reply.github.com<mailto:reply@reply.github.com> Date: Monday, October 10, 2016 at 2:37 PM To: Mathelab/ALTRE ALTRE@noreply.github.com<mailto:ALTRE@noreply.github.com> Cc: Ewy Mathe Ewy.Mathe@osumc.edu<mailto:Ewy.Mathe@osumc.edu>, Assign assign@noreply.github.com<mailto:assign@noreply.github.com> Subject: Re: [Mathelab/ALTRE] Adding gene names (#46)

I've mostly fixed this. Where should I push it when I am done? Make a new branch?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3Agithub.com_Mathelab_ALTRE_issues_46-23issuecomment-2D252704057&d=CwMCaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=kwZD24MMCbG_sisYwGVpukmuRHYOGbXk10phc-LvGu4&m=pkWo6jJNl41ic_xQrlnWj5gfoyxaNa9d19I7T8gGjVo&s=dck6no5Wq1f7DMz72kybY5CRw8PtcgSvWiXQzIRuGe0&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3Agithub.com_notifications_unsubscribe-2Dauth_AOpx3Sq9-2Dlk6PD38BN2fZNrC0cC7FRsYks5qyoX7gaJpZM4KRw0T&d=CwMCaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=kwZD24MMCbG_sisYwGVpukmuRHYOGbXk10phc-LvGu4&m=pkWo6jJNl41ic_xQrlnWj5gfoyxaNa9d19I7T8gGjVo&s=saFMY2_eBoXwJtNd_ecEnD5LOp3DzetOuSepOLz7rUM&e=.

osubmi784323 commented 8 years ago

Ok, thanks. I made a new branch called "add genes" and pushed the changes. It's fixed and I tested all the downstream functions and plots to make sure everything worked in both R and Rshiny. The major thing that needed fixing downstream was that the columns with the cell lines were now off by one since they are selected by number.

Mathelab commented 8 years ago

OK good. Are they still selected by number? Best to select by name… Ewy

From: baskineliz notifications@github.com<mailto:notifications@github.com> Reply-To: Mathelab/ALTRE reply@reply.github.com<mailto:reply@reply.github.com> Date: Monday, October 10, 2016 at 3:27 PM To: Mathelab/ALTRE ALTRE@noreply.github.com<mailto:ALTRE@noreply.github.com> Cc: Ewy Mathe Ewy.Mathe@osumc.edu<mailto:Ewy.Mathe@osumc.edu>, Assign assign@noreply.github.com<mailto:assign@noreply.github.com> Subject: Re: [Mathelab/ALTRE] Adding gene names (#46)

Ok, thanks. I made a new branch called "add genes" and pushed the changes. It's fixed and I tested all the downstream functions and plots to make sure everything worked in both R and Rshiny. The major thing that needed fixing downstream was that the columns with the cell lines were now off by one since they are selected by number.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Mathelab_ALTRE_issues_46-23issuecomment-2D252722857&d=CwMFaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=kwZD24MMCbG_sisYwGVpukmuRHYOGbXk10phc-LvGu4&m=OSc5DQdVpBIdo2Mma16mQRXIJtn384OYWz68Ilc1OzI&s=nzF2I5i6Wg9ByzndQMuIuw9-MqBCdy13LJ6ElxU08zM&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOpx3V-2DnhLZ5-2DzzIafvn7XajIDuIcmGwks5qypGegaJpZM4KRw0T&d=CwMFaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=kwZD24MMCbG_sisYwGVpukmuRHYOGbXk10phc-LvGu4&m=OSc5DQdVpBIdo2Mma16mQRXIJtn384OYWz68Ilc1OzI&s=v59O7N-5-fiJe4t2eSamqt5nhIqDxAXRBRw7rj893U4&e=.

osubmi784323 commented 8 years ago

Right now the names of the columns are "A549" and "SAEC" so you cannot select by name since it is specific to the cell-type you are studying. I could change it to "Sample 1" and "Sample 2" though.

Mathelab commented 8 years ago

There should be a way to grab those names automatically….you know what the reference is, that’s now included in the output data frame, no? Ewy

From: baskineliz notifications@github.com<mailto:notifications@github.com> Reply-To: Mathelab/ALTRE reply@reply.github.com<mailto:reply@reply.github.com> Date: Monday, October 10, 2016 at 3:35 PM To: Mathelab/ALTRE ALTRE@noreply.github.com<mailto:ALTRE@noreply.github.com> Cc: Ewy Mathe Ewy.Mathe@osumc.edu<mailto:Ewy.Mathe@osumc.edu>, Assign assign@noreply.github.com<mailto:assign@noreply.github.com> Subject: Re: [Mathelab/ALTRE] Adding gene names (#46)

Right now the names of the columns are "A549" and "SAEC" so you cannot select by name since it is specific to the cell-type you are studying. I could change it to "Sample 1" and "Sample 2" though.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Mathelab_ALTRE_issues_46-23issuecomment-2D252725657&d=CwMCaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=kwZD24MMCbG_sisYwGVpukmuRHYOGbXk10phc-LvGu4&m=pUGnnBZzz4bihmbU2HQvt6on1sZPk5y5mZ2ZNlvR0B0&s=9GwoAdhh7wHb5e1jHwPZYqGVTsIBENgbKsTso6OIlxU&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOpx3YCBY4qvQkl2ZHnUdHlAlv6pTvZYks5qypOWgaJpZM4KRw0T&d=CwMCaQ&c=k9MF1d71ITtkuJx-PdWme51dKbmfPEvxwt8SFEkBfs4&r=kwZD24MMCbG_sisYwGVpukmuRHYOGbXk10phc-LvGu4&m=pUGnnBZzz4bihmbU2HQvt6on1sZPk5y5mZ2ZNlvR0B0&s=6QKak4wzwlXLOXsBvAFzJWXbZEglLqE2ODJa4mfmeac&e=.