FelixKrueger / SNPsplit

Allele-specific alignment sorting
http://felixkrueger.github.io/SNPsplit/
GNU General Public License v3.0
51 stars 19 forks source link

C>T SNPs preventing the assignment #33

Closed GuidoBarzaghi closed 3 years ago

GuidoBarzaghi commented 4 years ago

Hello!

I was wondering how SNPsplit behaves when a given read contains a C>T SNP as well as another one which would allow assignment. Is the read discarded altogether because of the C>T SNP or is the other "useful" SNP allowing assignment?

Thanks for your support!

FelixKrueger commented 4 years ago

I reckon you are talking about the case that SNPsplit is run in --bisulfite mode? I just had a look at the code that stores the SNP, and it looks like the SNPs are only excluded if they bases involve both C and T (on either strand). That should mean that if you had another SNP in the input file (e.g. C / G), then this SNP would be stored. By the looks of it if there was even another SNP (more aren't possible), then this last one would overwrite the position. I think overall SNPsplit is designed to only work with a single SNP per position, and I would advise to keep it that way so you don't run into weird corner cases...

GuidoBarzaghi commented 4 years ago

Thanks for your exceptionally quick reply! Yes I am dealing with bisulfite data.

I see, so another SNP at the exact same genomic coordinate would do the trick. But I guess my question was more oriented towards a read-wise view (sorry for my poor phrasing): given a read where two distinct positions harbour a SNP, and given that one of those SNPs is a C/T (or T/C) but the other is not (and can thus be used to assign the allelic origin of the read), what happens to this read? Is it discarded altogether due to the presence of the C/T (T/C) SNP or is it kept and assigned using the other SNP position?

Given your answer I have the feeling that the read would be kept and assigned since to my current understanding C/T (T/C) SNP positions are excluded first from the list of SNPs and then the assignment takes place with the remaining SNPs. Is that correct?

FelixKrueger commented 4 years ago

Yes that is correct. SNPs that involve C/T (or T/C) transitions are simply not stored internally, which means they are not going to partake in the allele assignment. If a read contains additional, non-C/T, SNPs, reads may be assigned the usual way. Sorry for not catching this point in the first place.