eturro / mmseq

Haplotype, isoform and gene level expression analysis using multi-mapping RNA-seq reads
GNU General Public License v2.0
67 stars 20 forks source link

Setting up matrix and identifying specific gene clusters #17

Open abhisheksinghnl opened 8 years ago

abhisheksinghnl commented 8 years ago

Hi,

I have same sample for 7 different time points and each with replicates. I would like to find out gene clusters for each time point that define that time point (gene signatures).

Here is the table of replicates and time points:

T0  4
T1  3
T2  3
T3  3
T4  4
T5  4
T6  7

The matrix that I am using is:

# M; no. of rows = no. of observations
1 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 0 0 0 0 0 
0 1 0 0 0 0 0
0 1 0 0 0 0 0
0 1 0 0 0 0 0
0 0 1 0 0 0 0
0 0 1 0 0 0 0
0 0 1 0 0 0 0
0 0 0 1 0 0 0
0 0 0 1 0 0 0
0 0 0 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 1 0 0
0 0 0 0 1 0 0
0 0 0 0 1 0 0
0 0 0 0 0 1 0
0 0 0 0 0 1 0
0 0 0 0 0 1 0
0 0 0 0 0 1 0
0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1

The command that I am using with mmdiff is :

`~/Softwares/MMseq/mmseq-latest/bin/mmdiff-linux -fixalpha -p 0 -T0_1.namesorted.bam.hits.gene.mmseq T0_2.namesorted.bam.hits.gene.mmseq T0_3.namesorted.bam.hits.gene.mmseq T0_4.namesorted.bam.hits.gene.mmseq T1_1.namesorted.bam.hits.gene.mmseq T1_2.namesorted.bam.hits.gene.mmseq T1_3.namesorted.bam.hits.gene.mmseq T2_1.namesorted.bam.hits.gene.mmseq T2_2.namesorted.bam.hits.gene.mmseq T2_3.namesorted.bam.hits.gene.mmseq T3_1.namesorted.bam.hits.gene.mmseq T3_2.namesorted.bam.hits.gene.mmseq T3_3.namesorted.bam.hits.gene.mmseq T4_1.namesorted.bam.hits.gene.mmseq T4_2.namesorted.bam.hits.gene.mmseq T4_3.namesorted.bam.hits.gene.mmseq T4_4.namesorted.bam.hits.gene.mmseq T5_1.namesort.bam.hits.gene.mmseq T5_2.namesort.bam.hits.gene.mmseq T5_3.namesort.bam.hits.gene.mmseq T5_4.namesort.bam.hits.gene.mmseq T6_1.namesort.bam.hits.gene.mmseq T6_2.namesort.bam.hits.gene.mmseq T6_3.namesort.bam.hits.gene.mmseq T6_4.namesort.bam.hits.gene.mmseq T6_5.namesort.bam.hits.gene.mmseq T6_6.namesort.bam.hits.gene.mmseq T6_7.namesort.bam.hits.gene.mmseq > out.mmdiff_p0_fixalpha

` My first question, is everything correct till here ? if not could you please edit.

Second question: Once I have the output file from mmdiff, how to proceed to get clusters of time stage specific genes? I mean, how can I filter them out from the mmdiff output file or is it already done?

Could you please find sometime to answer these and guide me in my analysis.

Thank you.

Best regards Abhishek

eturro commented 8 years ago

You're interested in identifying features which are over-expressed at particular time points relative to the other time points?

abhisheksinghnl commented 8 years ago

Yes, indeed.

On Tue, Sep 20, 2016 at 1:05 PM, Ernest Turro notifications@github.com wrote:

You're interested in identifying features which are over-expressed at particular time points relative to the other time points?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eturro/mmseq/issues/17#issuecomment-248270595, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAA575OO_QLi7zodT97ICOFbZLzE-P2ks5qr73ggaJpZM4KBa0d .

eturro commented 8 years ago

OK, in that case what I suggest is that you compare each time point to all the others in a standard differential expression analysis and then do a polytomous model comparison. To simplify, suppose you had t1, t2, t3 with 3 reps in each. You would have to create three different matrices files:

## matrices_file1:

# M; no. of rows = no. of observations
1 0 0
1 0 0
1 0 0
0 1 0
0 1 0
0 1 0
0 0 1
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 0
0 0
0 0
0 1
0 1
0 1
0 1
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

## matrices_file2:

# M; no. of rows = no. of observations
1 0 0
1 0 0
1 0 0
0 1 0
0 1 0
0 1 0
0 0 1
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 1
0 0
0 0
0 0
0 1
0 1
0 1
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

## matrices_file3:

# M; no. of rows = no. of observations
1 0 0
1 0 0
1 0 0
0 1 0
0 1 0
0 1 0
0 0 1
0 0 1
0 0 1
# C; no. of rows = no. of observations and no. of columns = 2 (one for each model)
0 1
0 1
0 1
0 1
0 1
0 1
0 0
0 0
0 0
# P0(collapsed); no. of rows = no. of classes for model 0
1
# P1(collapsed); no. of rows = no. of classes for model 1
.5
-.5

You would run mmdiff three times.

mmdiff -m matrices_file1 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond1vsOthers.mmdiff

mmdiff -m matrices_file2 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond2vsOthers.mmdiff

mmdiff -m matrices_file3 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond3vsOthers.mmdiff

Once you're done, you can do a polytomous analysis in R:

source("/path/to/mmseq/src/mmseq.R")

res=polyclass(files=c("cond1vsOthers.mmdiff", "cond2vsOthers.mmdiff","cond3vsOthers.mmdiff"), prior=c(0.7,0.1,0.1,0.1)) # this would set the baseline model prior probability to 0.7 and that of the other models to 0.1

Now the res object will contain everything you need, including the posterior probability for each model and the estimated differential expression "eta" parameters (to distinguish between up and down-regulation at each time point).

abhisheksinghnl commented 8 years ago

Hi,

Thank you, I will try to adopt these for 7 time points.

May I bother you in future incase I run into trouble. :D

Thank you

On Tue, Sep 20, 2016 at 1:16 PM, Ernest Turro notifications@github.com wrote:

OK, in that case what I suggest is that you compare each time point to all the others in a standard differential expression analysis and then do a polytomous model comparison. To simplify, suppose you had t1, t2, t3 with 3 reps in each. You would have to create three different matrices files: matrices_file1: M; no. of rows = no. of observations

1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5 matrices_file2: M; no. of rows = no. of observations

1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 1 P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5 matrices_file3: M; no. of rows = no. of observations

1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5

You would run mmdiff three times.

mmdiff -m matrices_file1 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond1vsOthers.mmdiff

mmdiff -m matrices_file2 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond2vsOthers.mmdiff

mmdiff -m matrices_file3 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond3vsOthers.mmdiff

Once you're done, you can do a polytomous analysis in R:

source("/path/to/mmseq/src/mmseq.R")

res=polyclass(files=c("cond1vsOthers.mmdiff", "cond2vsOthers.mmdiff","cond3vsOthers.mmdiff"), prior=c(0.7,0.1,0.1,0.1)) # this would set the baseline model prior probability to 0.7 and that of the other models to 0.1

Now the res object will contain everything you need, including the posterior probability for each model and the estimated differential expression "eta" parameters (to distinguish between up and down-regulation at each time point).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eturro/mmseq/issues/17#issuecomment-248272692, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAA58dc2RA17aeCCeo-bRmFk51XWCd7ks5qr8CGgaJpZM4KBa0d .

abhisheksinghnl commented 7 years ago

Hi,

I have one more question, as their a way to control the number of processors that mmdiff uses? by default it is using all the available threads.

Thank you

Best regards Abhishek

On Tue, Sep 20, 2016 at 1:22 PM, Abhishek Singh abhisheksinghnl@gmail.com wrote:

Hi,

Thank you, I will try to adopt these for 7 time points.

May I bother you in future incase I run into trouble. :D

Thank you

On Tue, Sep 20, 2016 at 1:16 PM, Ernest Turro notifications@github.com wrote:

OK, in that case what I suggest is that you compare each time point to all the others in a standard differential expression analysis and then do a polytomous model comparison. To simplify, suppose you had t1, t2, t3 with 3 reps in each. You would have to create three different matrices files: matrices_file1: M; no. of rows = no. of observations

1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5 matrices_file2: M; no. of rows = no. of observations

1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 1 P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5 matrices_file3: M; no. of rows = no. of observations

1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5

You would run mmdiff three times.

mmdiff -m matrices_file1 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond1vsOthers.mmdiff

mmdiff -m matrices_file2 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond2vsOthers.mmdiff

mmdiff -m matrices_file3 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond3vsOthers.mmdiff

Once you're done, you can do a polytomous analysis in R:

source("/path/to/mmseq/src/mmseq.R")

res=polyclass(files=c("cond1vsOthers.mmdiff", "cond2vsOthers.mmdiff","cond3vsOthers.mmdiff"), prior=c(0.7,0.1,0.1,0.1)) # this would set the baseline model prior probability to 0.7 and that of the other models to 0.1

Now the res object will contain everything you need, including the posterior probability for each model and the estimated differential expression "eta" parameters (to distinguish between up and down-regulation at each time point).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eturro/mmseq/issues/17#issuecomment-248272692, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAA58dc2RA17aeCCeo-bRmFk51XWCd7ks5qr8CGgaJpZM4KBa0d .

abhisheksinghnl commented 7 years ago

Hi,

I got it, it is OMP_NUM_THREADS

Best Abhishek

On Thu, Sep 22, 2016 at 9:26 AM, Abhishek Singh abhisheksinghnl@gmail.com wrote:

Hi,

I have one more question, as their a way to control the number of processors that mmdiff uses? by default it is using all the available threads.

Thank you

Best regards Abhishek

On Tue, Sep 20, 2016 at 1:22 PM, Abhishek Singh <abhisheksinghnl@gmail.com

wrote:

Hi,

Thank you, I will try to adopt these for 7 time points.

May I bother you in future incase I run into trouble. :D

Thank you

On Tue, Sep 20, 2016 at 1:16 PM, Ernest Turro notifications@github.com wrote:

OK, in that case what I suggest is that you compare each time point to all the others in a standard differential expression analysis and then do a polytomous model comparison. To simplify, suppose you had t1, t2, t3 with 3 reps in each. You would have to create three different matrices files: matrices_file1: M; no. of rows = no. of observations

1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5 matrices_file2: M; no. of rows = no. of observations

1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 1 P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5 matrices_file3: M; no. of rows = no. of observations

1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5

You would run mmdiff three times.

mmdiff -m matrices_file1 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond1vsOthers.mmdiff

mmdiff -m matrices_file2 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond2vsOthers.mmdiff

mmdiff -m matrices_file3 cond1_rep1.mmseq cond1_rep2.mmseq cond1_rep3.mmseq cond2_rep1.mmseq cond2_rep2.mmseq cond2_rep3.mmseq cond3_rep1.mmseq cond3_rep2.mmseq cond3_rep3.mmseq > cond3vsOthers.mmdiff

Once you're done, you can do a polytomous analysis in R:

source("/path/to/mmseq/src/mmseq.R")

res=polyclass(files=c("cond1vsOthers.mmdiff", "cond2vsOthers.mmdiff","cond3vsOthers.mmdiff"), prior=c(0.7,0.1,0.1,0.1)) # this would set the baseline model prior probability to 0.7 and that of the other models to 0.1

Now the res object will contain everything you need, including the posterior probability for each model and the estimated differential expression "eta" parameters (to distinguish between up and down-regulation at each time point).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eturro/mmseq/issues/17#issuecomment-248272692, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAA58dc2RA17aeCCeo-bRmFk51XWCd7ks5qr8CGgaJpZM4KBa0d .

abhisheksinghnl commented 7 years ago

Hi,

I am getting this error (below), while running mmdiff for 3 groups each with 3 replicates.

Design matrix for model 0 ([1|M]): Error: collinearity in combined matrix of intercept and covariates for model 0

The command that I am using is :

~/Softwares/MMseq/mmseq/bin/mmdiff-linux -m mat1.txt /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/E18/E18_1.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/E18/E18_2.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/E18/E18_3.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P1/P1_1.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P1/P1_2.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P1/P1_3.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P3/P3_1.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P3/P3_2.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P3/P3_3.namesorted.bam.hits.gene.mmseq > E18vsOthers.mmdiff

and this is what happens:

Min unique hits fraction for normalisation: 1
No. threads: 16
Parsing /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/E18/E18_1.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/E18/E18_2.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/E18/E18_3.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P1/P1_1.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P1/P1_2.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P1/P1_3.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P3/P3_1.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P3/P3_2.namesorted.bam.hits.gene.mmseq /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P3/P3_3.namesorted.bam.hits.gene.mmseq 
Analysing 32637 features
Using 12943/32637 features for normalisation.
Log scale normalisation factors:
    /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/E18/E18_1.namesorted.bam.hits.gene.mmseq   -0.0922577
    /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/E18/E18_2.namesorted.bam.hits.gene.mmseq   0.157586
    /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/E18/E18_3.namesorted.bam.hits.gene.mmseq   -0.309913
    /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P1/P1_1.namesorted.bam.hits.gene.mmseq 0.370167
    /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P1/P1_2.namesorted.bam.hits.gene.mmseq 0.127327
    /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P1/P1_3.namesorted.bam.hits.gene.mmseq -0.515323
    /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P3/P3_1.namesorted.bam.hits.gene.mmseq 0.04334
    /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P3/P3_2.namesorted.bam.hits.gene.mmseq 0.1999
    /media/data/u0106457/FASTQ/Rat/Ageing/AlignedBowtie1/P3/P3_3.namesorted.bam.hits.gene.mmseq 0.153414
Design matrix for model 0 ([1|M]):
Error: collinearity in combined matrix of intercept and covariates for model 0

Could you please let me know, how to fix this.

Thank you

Best regards Abhishek

ndrubins commented 7 years ago

Hi @eturro,

I went into this post hoping that it specifies a design matrix for a estimating the effect of time on expression levels rather than all pair-wise certain time-point vs. baseline time-point analyses. But perhaps more generally, can you think of a way to translate an R syntax lm or model matrix into an mmdiff design matrix?

Thanks a lot

eturro commented 7 years ago

If you specify the linear model you're after I can help you with the mmdiff matrices file.

ndrubins commented 7 years ago

For this specific question I'm interested in two cases:

  1. I have n samples for m time points (not from them same subjects hence no autocorrelation), and I want to estimate the effect of time on expression.

Is this matrix (for 3 samples from 3 time points) the correct one:

M; no. of rows = no. of observations

0 0 0 0 0 0 0 0 0

C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 0 0 0 0 0 0 1 0 1 0 1 0 2 0 2 0 2

P0(collapsed); no. of rows = no. of classes for model 0

1

P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5

And then the second case is having two groups: treatment and control and, n samples from each per each of the m from time points, and I'm interested in estimating effect of time on the treatment vs. control fold-change, which I guess is simply the treatment x time interaction term.

eturro commented 7 years ago

Hello The number of rows of the P matrix must always correspond to the number of levels in the corresponding C column, so what you have shown is not valid (two rows for P1 but four levels for C[,2]).

Do you want to assume that that the change in expression between time point 0 and time point 1 is the same as between time point 1 and time point 2?

In that case you could use the following and eta1_0 would represent the log fold change between time point 0 and time point 2 (twice the log fold changes between time points 0 and 1 or 1 and 2):

M; no. of rows = no. of observations

0 0 0 0 0 0 0 0 0

C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 0 0 0 0 0 0 1 0 1 0 1 0 2 0 2 0 2

P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1 -.5 0 .5

If you would like to introduce a treatment and a treatment x time interaction, then I would compare multiple models and use polyclass() to perform polytomous model selection. Let's assume you have two reps in each group.

Baseline model (model 0): global mean.

Comparison with model 1: time effect ( as above ):

M; no. of rows = no. of observations

0 # untreated t0 0 # untreated t0 0 # untreated t1 0 # untreated t1 0 # untreated t2 0 # untreated t2 0 # treated t0 0 # treated t0 0 # treated t1 0 # treated t1 0 # treated t2 0 # treated t2

C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 0 0 0 0 1 0 1 0 2 0 2 0 0 0 0 0 1 0 1 0 2 0 2

P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1 -.5 0 .5

Comparison with model 2: time effect and a treatment effect:

M; no. of rows = no. of observations

0 # untreated t0 0 # untreated t0 0 # untreated t1 0 # untreated t1 0 # untreated t2 0 # untreated t2 0 # treated t0 0 # treated t0 0 # treated t1 0 # treated t1 0 # treated t2 0 # treated t2

C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 0 0 0 0 1 0 1 0 2 0 2 0 3 0 3 0 4 0 4 0 5 0 5

P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1 -.5 -.5 0 -.5 .5 -.5 -.5 .5 0 .5 .5 .5

Comparison with model 3: treatment-specific time effect and a treatment effect:

M; no. of rows = no. of observations

0 # untreated t0 0 # untreated t0 0 # untreated t1 0 # untreated t1 0 # untreated t2 0 # untreated t2 0 # treated t0 0 # treated t0 0 # treated t1 0 # treated t1 0 # treated t2 0 # treated t2

C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 0 0 0 0 1 0 1 0 2 0 2 0 3 0 3 0 4 0 4 0 5 0 5

P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1 -.5 0 -.5 0 0 -.5 .5 0 -.5 0 -.5 .5 0 0 .5 0 .5 .5

I hope the interpretations of the estimated eta1_* coefficients is straightforward.

best wishes Ernest

On 18 Jul 2017, at 17:45, ndrubins notifications@github.com wrote:

For this specific question I'm interested in two cases:

I have n samples for m time points (not from them same subjects hence no autocorrelation), and I want to estimate the effect of time on expression. Is this matrix (for 3 samples from 3 time points) the correct one:

M; no. of rows = no. of observations

0 0 0 0 0 0 0 0 0

C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 0 0 0 0 0 0 1 0 1 0 1 0 2 0 2 0 3

P0(collapsed); no. of rows = no. of classes for model 0

1

P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5

And then the second case is having two groups: treatment and control and, n samples from each per each of the m from time points, and I'm interested in estimating effect of time on the treatment vs. control fold-change, which I guess is simply the treatment x time interaction term.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eturro/mmseq/issues/17#issuecomment-316124755, or mute the thread https://github.com/notifications/unsubscribe-auth/AEM2EOEgmkVRQXFQBRcIvXkplJSzdYtwks5sPOESgaJpZM4KBa0d.

ndrubins commented 7 years ago

Thanks a lot

On Wed, Jul 19, 2017 at 8:43 AM, Ernest Turro notifications@github.com wrote:

Hello The number of rows of the P matrix must always correspond to the number of levels in the corresponding C column, so what you have shown is not valid (two rows for P1 but four levels for C[,2]).

Do you want to assume that that the change in expression between time point 0 and time point 1 is the same as between time point 1 and time point 2?

In that case you could use the following and eta1_0 would represent the log fold change between time point 0 and time point 2 (twice the log fold changes between time points 0 and 1 or 1 and 2):

M; no. of rows = no. of observations

0 0 0 0 0 0 0 0 0

C; no. of rows = no. of observations and no. of columns = 2 (one for

each model) 0 0 0 0 0 0 0 1 0 1 0 1 0 2 0 2 0 2

P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1 -.5 0 .5

If you would like to introduce a treatment and a treatment x time interaction, then I would compare multiple models and use polyclass() to perform polytomous model selection. Let's assume you have two reps in each group.

Baseline model (model 0): global mean.

Comparison with model 1: time effect ( as above ):

M; no. of rows = no. of observations

0 # untreated t0 0 # untreated t0 0 # untreated t1 0 # untreated t1 0 # untreated t2 0 # untreated t2 0 # treated t0 0 # treated t0 0 # treated t1 0 # treated t1 0 # treated t2 0 # treated t2

C; no. of rows = no. of observations and no. of columns = 2 (one for

each model) 0 0 0 0 0 1 0 1 0 2 0 2 0 0 0 0 0 1 0 1 0 2 0 2

P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1 -.5 0 .5

Comparison with model 2: time effect and a treatment effect:

M; no. of rows = no. of observations

0 # untreated t0 0 # untreated t0 0 # untreated t1 0 # untreated t1 0 # untreated t2 0 # untreated t2 0 # treated t0 0 # treated t0 0 # treated t1 0 # treated t1 0 # treated t2 0 # treated t2

C; no. of rows = no. of observations and no. of columns = 2 (one for

each model) 0 0 0 0 0 1 0 1 0 2 0 2 0 3 0 3 0 4 0 4 0 5 0 5

P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1 -.5 -.5 0 -.5 .5 -.5 -.5 .5 0 .5 .5 .5

Comparison with model 3: treatment-specific time effect and a treatment effect:

M; no. of rows = no. of observations

0 # untreated t0 0 # untreated t0 0 # untreated t1 0 # untreated t1 0 # untreated t2 0 # untreated t2 0 # treated t0 0 # treated t0 0 # treated t1 0 # treated t1 0 # treated t2 0 # treated t2

C; no. of rows = no. of observations and no. of columns = 2 (one for

each model) 0 0 0 0 0 1 0 1 0 2 0 2 0 3 0 3 0 4 0 4 0 5 0 5

P0(collapsed); no. of rows = no. of classes for model 0

1 P1(collapsed); no. of rows = no. of classes for model 1 -.5 0 -.5 0 0 -.5 .5 0 -.5 0 -.5 .5 0 0 .5 0 .5 .5

I hope the interpretations of the estimated eta1_* coefficients is straightforward.

best wishes Ernest

On 18 Jul 2017, at 17:45, ndrubins notifications@github.com wrote:

For this specific question I'm interested in two cases:

I have n samples for m time points (not from them same subjects hence no autocorrelation), and I want to estimate the effect of time on expression. Is this matrix (for 3 samples from 3 time points) the correct one:

M; no. of rows = no. of observations

0 0 0 0 0 0 0 0 0

C; no. of rows = no. of observations and no. of columns = 2 (one for each model)

0 0 0 0 0 0 0 1 0 1 0 1 0 2 0 2 0 3

P0(collapsed); no. of rows = no. of classes for model 0

1

P1(collapsed); no. of rows = no. of classes for model 1

.5 -.5

And then the second case is having two groups: treatment and control and, n samples from each per each of the m from time points, and I'm interested in estimating effect of time on the treatment vs. control fold-change, which I guess is simply the treatment x time interaction term.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/eturro/mmseq/issues/17#issuecomment-316124755>, or mute the thread https://github.com/notifications/unsubscribe-auth/ AEM2EOEgmkVRQXFQBRcIvXkplJSzdYtwks5sPOESgaJpZM4KBa0d.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eturro/mmseq/issues/17#issuecomment-316428822, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGsHsDBJ66JrhNWKV6gGhw17ZeR7e0Fks5sPiQ-gaJpZM4KBa0d .