bomeara / treevo

1 stars 1 forks source link

boxcoxtransformation #33

Closed Junjun-Xu closed 3 years ago

Junjun-Xu commented 4 years ago

Hi, Thank you very much for your script. But I want to know is there has a faster method to deal with large data size? Wheat has nearly 120000 gene. If i use boxcox in MASS package, it will cost long time. Can you give me some suggestions? Thanks a lot. Best wishes! Jun

Junjun-Xu commented 4 years ago

By the way, I also what to know why the "variable~1", what is the meaning of this? Thanks a lot.

dwbapst commented 4 years ago

Hi Junjun,

I'm a little curious - I wouldn't recommend using our code, unless you are trying to apply comparative methods analyses with TreEvo. (And you comments about the size of the wheat genome suggests you aren't)

But yes, our code uses package MASS to get it done. As I recall its usage was well explained here:

https://stackoverflow.com/questions/33999512/how-to-use-the-box-cox-power-transformation-in-r/34002020

That will probably explain why the model provided to the function is just a regression on the intercept.

I don't think we've seen a faster boxcox transformation. We didn't look too hard for one; unfortunately it was far from our worst bottleneck in terms of computation time on 'small' (~30 taxon) datasets.

Junjun-Xu commented 4 years ago

Hi, Thank you very much for your timely reply. Yes, I didn't use your code, I just want to standardize the residuals of my RNA-seq data. for lm (y ~ 1), I still don know why I should do this. Can you give me some advice? How can I quickly standardize my data? Thanks a lot! Jun

dwbapst commented 3 years ago

@Junjun-Xu it wasn't really relevant to open an issue to help you do a boxcox transformation for standardizing your data, and so I am closing this issue