Closed RubioB closed 1 year ago
We prefer to call clusters just once based on the merge of all data i.e. from a single merged.bam file. Calling clusters multiple times on different datasets will give slightly (or grossly) different cluster locations. It's better to define loci just once, and then run diff exp. on all of those clusters.
Hi Mike,
I have a question regarding the shortstack analysis from samples with a very different starting number of reads. In my analysis I have 4 different conditions, two of which has one of the replicates (out of 3) with a number of reads 2 times more important for one of the conditions and almost 4 time smore important for the other.
I did the shortstack analysis independently on the four conditions and I find more clusters in the one with the different librairires size compared to the others. I tried to correct this by keeping only clusters that are identified on all three replicaes with a minimum of 3 reads per replicate and I also filter using the coefficient of variation of the number of reads that it must be less than or equal to 50 % in each of the three replicates of each condition.
If I can manage the effect of the difference in size for the condition having a replicate with 4 times more reads (the variations between replicates are so important that the filtering based on the coefficient of variation works well), for the condition with a replicate with 2 times more reads, the impact of the difference in librairies size persists in the final number of clusters found.
I used the mincov parameter with rpm which according to the documentation would take into account the different in size of librairies but I'm not sure that this is enought ?
Can you tel me how I can do to deal with these differences and have resultats that I can compare between conditions ? Is there something that I didn't understand or did wrong ?
Thanks in advance !
Bernadette