MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

Does version 3.8.4 really filter counts by RPM0.5? #69

Closed Lonelyfighter closed 6 years ago

Lonelyfighter commented 6 years ago

Hi Dr. Axtell, I am using ShortStack these days and found it is quite amazing. But I have some questions about the mincov parameters. The questions is I think this version doesn't use rpm to determine the cluster as default. I made several tests using my data. In my case, the alignment rate of mapping reads is around 4%, for example. So I used rpm first (as default and also I made a test by setting rpm0.5) to run the program. But in the Results and Counts, I found many clusters with a rpm under 0.5. I tried to find the reason. I found in the log file:

At specified mincov of 0.5rpm with 11,880,255 placed primary reads, mincov is 6 raw alignments

And I found all the clusters have raw alignments above 6. I think this 6 is used to filter the cluster, but not the 0.5rpm for all the reads. In this case, all reads is 463,239,074.

I also ran with rpmm0.5 and I got the exactly the same result. But in the Results file, the rpm is calculated with all the reads (mapped + unmapped). Cause in my case, only 4% reads are mapped, that makes many clusters with a rpm under 0.5.

In the Test_DATA file, the mapped reads is near the total reads. So if the "raw alignments" is calculated by the mapped reads, it still works for the whole reads. But in my case, if the mapped ratio is low, that makes different. So if I used mapped reads 11,880,255, I got mincov is 6 raw alignmens. But if I used all reads 463,239,047, I got mincov is around 25. But the program used 6 to filter the cluster, not 25. Even when I use rpm0.5 as default.

So I am wandering if in this version, the program uses rpmm to filter the cluster, even the default set is rpm0.5. But I read you replies and found you recommended to use all the reads to calculate rpm. I am a little confused.

Could you please check if I am wrong?

Thanks a lot,

Xin.

MikeAxtell commented 6 years ago

Thanks, I’ll look into it. Which version?

On Wed, Jan 17, 2018 at 5:14 AM Lonelyfighter notifications@github.com wrote:

Hi Dr. Axtell, I am using ShortStack these days and found it is quite amazing. But I have some questions about the mincov parameters. The questions is I think this version doesn't use rpm to determine the cluster as default. I made several tests using my data. In my case, the alignment rate of mapping reads is around 4%, for example. So I used rpm first (as default and also I made a test by setting rpm0.5) to run the program. But in the Results and Counts, I found many clusters with a rpm under 0.5. I tried to find the reason. I found in the log file:

At specified mincov of 0.5rpm with 11,880,255 placed primary reads, mincov is 6 raw alignments

And I found all the clusters have raw alignments above 6. I think this 6 is used to filter the cluster, but not the 0.5rpm for all the reads. In this case, all reads is 463,239,074.

I also ran with rpmm0.5 and I got the exactly the same result. But in the Results file, the rpm is calculated with all the reads (mapped + unmapped). Cause in my case, only 4% reads are mapped, that makes many clusters with a rpm under 0.5.

In the Test_DATA file, the mapped reads is near the total reads. So if the "raw alignments" is calculated by the mapped reads, it still works for the whole reads. But in my case, if the mapped ratio is low, that makes different. So if I used mapped reads 11,880,255, I got mincov is 6 raw alignmens. But if I used all reads 463,239,047, I got mincov is around 25. But the program used 6 to filter the cluster, not 25. Even when I use rpm0.5 as default.

So I am wandering if in this version, the program uses rpmm to filter the cluster, even the default set is rpm0.5. But I read you replies and found you recommended to use all the reads to calculate rpm. I am a little confused.

Could you please check if I am wrong?

Thanks a lot,

Xin.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/69, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiXibL7gbxvxcBwZbuU6ERlQlWAwAuEks5tLcf9gaJpZM4RhGQN .

-- Michael J. Axtell, Ph.D. Professor of Biology Penn State University http://sites.psu.edu/axtell

Lonelyfighter commented 6 years ago

Hi, it is version 3.8.4. Thanks a lot, Xin.

Lonelyfighter commented 6 years ago

Hi, it is version 3.8.4. Thanks a lot, Xin.

On 01/17/2018 01:20 PM, Mike Axtell wrote:

Thanks, I’ll look into it. Which version?

On Wed, Jan 17, 2018 at 5:14 AM Lonelyfighter notifications@github.com wrote:

Hi Dr. Axtell, I am using ShortStack these days and found it is quite amazing. But I have some questions about the mincov parameters. The questions is I think this version doesn't use rpm to determine the cluster as default. I made several tests using my data. In my case, the alignment rate of mapping reads is around 4%, for example. So I used rpm first (as default and also I made a test by setting rpm0.5) to run the program. But in the Results and Counts, I found many clusters with a rpm under 0.5. I tried to find the reason. I found in the log file:

At specified mincov of 0.5rpm with 11,880,255 placed primary reads, mincov is 6 raw alignments

And I found all the clusters have raw alignments above 6. I think this 6 is used to filter the cluster, but not the 0.5rpm for all the reads. In this case, all reads is 463,239,074.

I also ran with rpmm0.5 and I got the exactly the same result. But in the Results file, the rpm is calculated with all the reads (mapped + unmapped). Cause in my case, only 4% reads are mapped, that makes many clusters with a rpm under 0.5.

In the Test_DATA file, the mapped reads is near the total reads. So if the "raw alignments" is calculated by the mapped reads, it still works for the whole reads. But in my case, if the mapped ratio is low, that makes different. So if I used mapped reads 11,880,255, I got mincov is 6 raw alignmens. But if I used all reads 463,239,047, I got mincov is around 25. But the program used 6 to filter the cluster, not 25. Even when I use rpm0.5 as default.

So I am wandering if in this version, the program uses rpmm to filter the cluster, even the default set is rpm0.5. But I read you replies and found you recommended to use all the reads to calculate rpm. I am a little confused.

Could you please check if I am wrong?

Thanks a lot,

Xin.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/69, or mute the thread

https://github.com/notifications/unsubscribe-auth/AGiXibL7gbxvxcBwZbuU6ERlQlWAwAuEks5tLcf9gaJpZM4RhGQN .

-- Michael J. Axtell, Ph.D. Professor of Biology Penn State University http://sites.psu.edu/axtell

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/69#issuecomment-358287958, or mute the thread https://github.com/notifications/unsubscribe-auth/Ah4RDHiZ9nI0m8IM5cV3Uib1lliEl9i2ks5tLeWIgaJpZM4RhGQN.

MikeAxtell commented 6 years ago

Yes, this was a bug. When I added the 'rpmm' option in 3.8.4, I didn't do it right, and both 'rpm' and 'rpmm' requests were being treated as 'rpmm' requests.

I just released version 3.8.5 that fixes the issue. Thank you for letting me know about it.

By the way, why do you have data where only 4% of the reads map to the reference genome? That is a very low rate of alignment!

Lonelyfighter commented 6 years ago

Hi, thanks for checking and fixing this.

In my case, I am working on the plant-fungi interactions. I have the plant leaves infected by fungi at very early time points, when there are very few fungi biomass on the leaves. So when we extracted rna from infected leaves (contain both plant and fungi sRNA), only a few sRNA come from fungi at this time point. Based on the transcriptome data we have, at the earliest time point, only ~1 to 2% RNA reads come from fungi.

I hope I explain this clearly.

Thanks again,

Xin.

On 01/25/2018 05:22 PM, Mike Axtell wrote:

Yes, this was a bug. When I added the 'rpmm' option in 3.8.4, I didn't do it right, and both 'rpm' and 'rpmm' requests were being treated as 'rpmm' requests.

I just released version 3.8.5 that fixes the issue. Thank you for letting me know about it.

By the way, why do you have data where only 4% of the reads map to the reference genome? That is a very low rate of alignment!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/69#issuecomment-360518597, or mute the thread https://github.com/notifications/unsubscribe-auth/Ah4RDG0S8j0McZZNGMH44vG5iFJ95p6-ks5tOKpOgaJpZM4RhGQN.

MikeAxtell commented 6 years ago

Got it. Sounds like a cool project! Thanks again for letting me know about the bug.

On Thu, Jan 25, 2018 at 11:28 AM, Lonelyfighter notifications@github.com wrote:

Hi, thanks for checking and fixing this.

In my case, I am working on the plant-fungi interactions. I have the plant leaves infected by fungi at very early time points, when there are very few fungi biomass on the leaves. So when we extracted rna from infected leaves (contain both plant and fungi sRNA), only a few sRNA come from fungi at this time point. Based on the transcriptome data we have, at the earliest time point, only ~1 to 2% RNA reads come from fungi.

I hope I explain this clearly.

Thanks again,

Xin.

On 01/25/2018 05:22 PM, Mike Axtell wrote:

Yes, this was a bug. When I added the 'rpmm' option in 3.8.4, I didn't do it right, and both 'rpm' and 'rpmm' requests were being treated as 'rpmm' requests.

I just released version 3.8.5 that fixes the issue. Thank you for letting me know about it.

By the way, why do you have data where only 4% of the reads map to the reference genome? That is a very low rate of alignment!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/ 69#issuecomment-360518597, or mute the thread https://github.com/notifications/unsubscribe-auth/ Ah4RDG0S8j0McZZNGMH44vG5iFJ95p6-ks5tOKpOgaJpZM4RhGQN.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/69#issuecomment-360520351, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiXiXKJck3Vvo7KzRskAWxc77ls_Ef2ks5tOKuXgaJpZM4RhGQN .

-- Michael J. Axtell, Ph.D. Professor of Biology Penn State University http://sites.psu.edu/axtell