Closed luckysardar closed 1 year ago
Just to understand your setup, you constructed a dual hybrid 129S1/CAST strain with SNPsplit, then aligned RNA-seq data against it, and then did what exactly, run ASE readcounter, using which files? Where did you get the positions (and annotations) from, and how does this compare to the filtered list that is used by SNPsplit?
Conceptually, I think it is fairly easy to picture a scenario like this one:
ENSMUSG00000028989 4 148495681 30 48 Angptl7
ENSMUSG00000028989 4 148495686 41 51 Angptl7
ENSMUSG00000028989 4 148495785 47 28 Angptl7
ENSMUSG00000028989 4 148495809 45 32 Angptl7
ENSMUSG00000028989 4 148495841 37 33 Angptl7
ENSMUSG00000028989 4 148495844 37 32 Angptl7
ENSMUSG00000028989 4 148495966 28 16 Angptl7
ENSMUSG00000028989 4 148495980 31 16 Angptl7
ENSMUSG00000028989 4 148496076 24 36 Angptl7
ENSMUSG00000028989 4 148496128 21 39 Angptl7
ENSMUSG00000028989 4 148496299 26 26 Angptl7
ENSMUSG00000028989 4 148498078 57 49 Angptl7
ENSMUSG00000028989 4 148500125 0 72 Angptl7
ENSMUSG00000028989 4 148500308 25 25 Angptl7
ENSMUSG00000028989 4 148500456 13 12 Angptl7
Where you seem to have biallelic counts for most positions in the Angptl7 gene, only position 148500125
seems to be exclusively the CAST allele. In such a case, it would seem likely that the annotated position in 129S1 is incorrect, i.e. both alleles in your strain harbour the same (CAST) allele in this case.
Generally, I would be worried if you are getting very high counts for one allele, and exactly 0 for the other allele, as this might be a dropout. Situations like this look more credible to me:
ENSMUSG00000042271 X 142238172 0 27 Nxt2
ENSMUSG00000042271 X 142238313 2 14 Nxt2
ENSMUSG00000042271 X 142238460 6 42 Nxt2
ENSMUSG00000042271 X 142238554 2 43 Nxt2
ENSMUSG00000042271 X 142238748 0 23 Nxt2
ENSMUSG00000042271 X 142238809 2 27 Nxt2
ENSMUSG00000042271 X 142238836 1 38 Nxt2
ENSMUSG00000042271 X 142238967 0 16 Nxt2
Generally, I don't think I would have a biological explanation why you would see reads for only one allele for some part of a gene, and bi-allelic expression for another part. In a case like this:
ENSMUSG00000042396 9 48489361 0 269 Rbm7
ENSMUSG00000042396 9 48489437 0 320 Rbm7
ENSMUSG00000042396 9 48494073 145 302 Rbm7
You might have bi-allelic expression of Rbm7 and both positions 48489361
and 48489437
are incorrectly annotated for the 129S1 allele, or maybe something is 'wrong' with position 48494073
(which is 5kb further downstream)? Maybe that position is overlaps with another gene, or an antisense transcript of some sort? This is impossible to answer just by looking at some numbers, I would suggest you import the 129S1, CAST and Unassigned reads next to each other (e.g. in SeqMonk) and see if that yields a clue?
Haven't heard back in a while, New-Year-Closing.
Hello Felix
SNPsplit is a great tool its helping me a lot in my analysis. To validate the output file I got snp level count from ASE readcounter for each gene for two separate genomes. individual genes are having handful number of SNPs but one observation I made is between SNPs in a gene showing both monoallelic and biallelic SNP count can you please suggest me what is the issue going on strain I am having is 129S1 vs CAST and am attaching one example file