MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

Problems with shortstack output interpretations #102

Closed RubioB closed 1 year ago

RubioB commented 3 years ago

Hi !

I am a little confused concernaning the results obtained with shortstack.

To sum up, I work with two conditions and in each of them I have three biological replicates.

I firts performed an analysis under shortstack independently for each condition which allowad me to obtain the populations of smallRNA in each of the two conditions. In order to compare the populations of smallRNAs between my two conditions, I looked for the overlaps between clusters of the two analyzes (with bedIntersect). Thus, I identified 70% of clusters with overlap between the two conditions and so 30% clusters that were specific to each condition.

At the same time, I wanted to see if there were any expression differentials of smallRNA between my two conditions so I restarted an analysis under shortstack this time by analyzing the two conditions together. I then performed an analysis in DESEQ2 and it turned out that I only had a single cluster which was differentially expressed between my two conditions.

Looking at the results obtained between the two analyses, I don't understand how I can find 'only' 70% of overlaps between my clusters whereas when performing analysis under DESEQ2 there is only one differentially expressed cluster ?

Can you give me your opinion on these analyzes and results ? Maybe the are elements in my approach that are not right ?

Thank you in advance for your reply !

Bernadette

MikeAxtell commented 3 years ago

Hi Bernadette, thanks for your question!

I think this is about what I would expect. Differential expression is a completely different protocol compared to ShortStack's cluster definition methods. Differential expression by DESeq2 makes a general linear model to approximate the within-condition variation, based on the biological reps. If the reps are noisy, and especially for lowly expressed clusters, you'll never get a significant diff exp. call.

On the other hand, ShortStack discovers clusters by a simple cutoff of peak abundance. Clusters that are inherently lowly expressed will likely be, at random, just above or just below the threshold in any given grouping of reads. In other words, just because 30% of the clusters weren't found in both conditions by ShortStack does not by itself prove with mathematical certainty that they are diff expressed.

BTW we usuall take a different approach: Take all libraries and do a single ShortStack run - this defines a single set of sRNA clusters and should incorporate all of the data. Then use the counts in Counts.txt to run DESeq2.

Anyway hope this helps.

-- Michael J. Axtell, Ph.D. Professor of Biology Pennsylvania State University https://sites.psu.edu/axtell https://plantsmallrnagenes.science.psu.edu

From: RubioB notifications@github.com Date: Monday, November 2, 2020 at 12:17 PM To: MikeAxtell/ShortStack ShortStack@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [MikeAxtell/ShortStack] Problems with shortstack output interpretations (#102)

Hi !

I am a little confused concernaning the results obtained with shortstack.

To sum up, I work with two conditions and in each of them I have three biological replicates.

I firts performed an analysis under shortstack independently for each condition which allowad me to obtain the populations of smallRNA in each of the two conditions. In order to compare the populations of smallRNAs between my two conditions, I looked for the overlaps between clusters of the two analyzes (with bedIntersect). Thus, I identified 70% of clusters with overlap between the two conditions and so 30% clusters that were specific to each condition.

At the same time, I wanted to see if there were any expression differentials of smallRNA between my two conditions so I restarted an analysis under shortstack this time by analyzing the two conditions together. I then performed an analysis in DESEQ2 and it turned out that I only had a single cluster which was differentially expressed between my two conditions.

Looking at the results obtained between the two analyses, I don't understand how I can find 'only' 70% of overlaps between my clusters whereas when performing analysis under DESEQ2 there is only one differentially expressed cluster ?

Can you give me your opinion on these analyzes and results ? Maybe the are elements in my approach that are not right ?

Thank you in advance for your reply !

Bernadette

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMikeAxtell%2FShortStack%2Fissues%2F102&data=04%7C01%7Cmja18%40psu.edu%7C0ac38bcbba3f4ccfaad408d87f532ec1%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C637399342527481864%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=puliVEfmX%2FrpQ8MlBa4ao10m5f5IDCXl5jDF9o5ydpc%3D&reserved=0, or unsubscribehttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABUJPCMW7JVDKDM2M5P64LDSN3SSXANCNFSM4THXRVGQ&data=04%7C01%7Cmja18%40psu.edu%7C0ac38bcbba3f4ccfaad408d87f532ec1%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C637399342527491860%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=p4o76%2Bgl5HQD5WwsQlmvqV9FBCXTbQyODMlkHtwcW3I%3D&reserved=0.

RubioB commented 3 years ago

Your remarks confirm the assumptions I had !

Thank you very much for your answer !

Bernadette


De : Mike Axtell notifications@github.com Envoyé : lundi 2 novembre 2020 18:02 À : MikeAxtell/ShortStack ShortStack@noreply.github.com Cc : RubioB bernadetterubio@hotmail.com; Author author@noreply.github.com Objet : Re: [MikeAxtell/ShortStack] Problems with shortstack output interpretations (#102)

Hi Bernadette, thanks for your question!

I think this is about what I would expect. Differential expression is a completely different protocol compared to ShortStack's cluster definition methods. Differential expression by DESeq2 makes a general linear model to approximate the within-condition variation, based on the biological reps. If the reps are noisy, and especially for lowly expressed clusters, you'll never get a significant diff exp. call.

On the other hand, ShortStack discovers clusters by a simple cutoff of peak abundance. Clusters that are inherently lowly expressed will likely be, at random, just above or just below the threshold in any given grouping of reads. In other words, just because 30% of the clusters weren't found in both conditions by ShortStack does not by itself prove with mathematical certainty that they are diff expressed.

BTW we usuall take a different approach: Take all libraries and do a single ShortStack run - this defines a single set of sRNA clusters and should incorporate all of the data. Then use the counts in Counts.txt to run DESeq2.

Anyway hope this helps.

-- Michael J. Axtell, Ph.D. Professor of Biology Pennsylvania State University https://sites.psu.edu/axtell https://plantsmallrnagenes.science.psu.edu

From: RubioB notifications@github.com Date: Monday, November 2, 2020 at 12:17 PM To: MikeAxtell/ShortStack ShortStack@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [MikeAxtell/ShortStack] Problems with shortstack output interpretations (#102)

Hi !

I am a little confused concernaning the results obtained with shortstack.

To sum up, I work with two conditions and in each of them I have three biological replicates.

I firts performed an analysis under shortstack independently for each condition which allowad me to obtain the populations of smallRNA in each of the two conditions. In order to compare the populations of smallRNAs between my two conditions, I looked for the overlaps between clusters of the two analyzes (with bedIntersect). Thus, I identified 70% of clusters with overlap between the two conditions and so 30% clusters that were specific to each condition.

At the same time, I wanted to see if there were any expression differentials of smallRNA between my two conditions so I restarted an analysis under shortstack this time by analyzing the two conditions together. I then performed an analysis in DESEQ2 and it turned out that I only had a single cluster which was differentially expressed between my two conditions.

Looking at the results obtained between the two analyses, I don't understand how I can find 'only' 70% of overlaps between my clusters whereas when performing analysis under DESEQ2 there is only one differentially expressed cluster ?

Can you give me your opinion on these analyzes and results ? Maybe the are elements in my approach that are not right ?

Thank you in advance for your reply !

Bernadette

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMikeAxtell%2FShortStack%2Fissues%2F102&data=04%7C01%7Cmja18%40psu.edu%7C0ac38bcbba3f4ccfaad408d87f532ec1%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C637399342527481864%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=puliVEfmX%2FrpQ8MlBa4ao10m5f5IDCXl5jDF9o5ydpc%3D&reserved=0, or unsubscribehttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABUJPCMW7JVDKDM2M5P64LDSN3SSXANCNFSM4THXRVGQ&data=04%7C01%7Cmja18%40psu.edu%7C0ac38bcbba3f4ccfaad408d87f532ec1%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C637399342527491860%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=p4o76%2Bgl5HQD5WwsQlmvqV9FBCXTbQyODMlkHtwcW3I%3D&reserved=0.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/MikeAxtell/ShortStack/issues/102#issuecomment-720633652, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMV7LR3NHX3G4GHLVEOLUBLSN3X4LANCNFSM4THXRVGQ.