adrlar / CanSNPer

A hierarchical genotype classifier of clonal pathogens
GNU General Public License v3.0
9 stars 6 forks source link

Keep Getting "Classification: None" #26

Open spencer411 opened 5 years ago

spencer411 commented 5 years ago

I have built a database following the instructions on the page, and have tripled checked everything but despite my best efforts I always get back a classification of "none" when I run a draft genome.

My suspicion is that this may be because of nested structure in my classification scheme (e.g. a classification may be defined by two or even three SNP calls if one group is nested within another). Does canSNPer support this kind of scheme, or does each group need to be defined by a single SNP?

Is there any way to trace back what the problem is here, or am I just out of luck?

adrlar commented 5 years ago

Hi @spencer411 , CanSNPer does hierarchical classifications, and does indeed only have one single SNP that defines a clade. I am not quite sure what you are describing but it is possible CanSNPer supports your structure. For example, lets say the root of the tree is defined as SNP1, which has two children SNP2 and SNP3, the clade SNP2 is defined as having both SNP1 and SNP2, and your sequence must have both those SNPs to be classified as SNP2 (unless you tell CanSNPer to specifically "overlook" gaps in the tree).

In order to get a classification other than None your sequence must have the root SNP in the sequence. If the sequence in the above example has only SNP2 you will get a classification of None since it did not have the root SNP1.

What cannot currently be done is defining a single clade by multiple different SNPs, for example if you wanted a result where a sequence has SNP2 OR SNP3 it is defined as the same clade, this is not supported. (I have not thought about this properly, but I suspect you could construct one or several trees that will get the information you are looking for in this case too, if its not a common occurance in your tree structure)

Does that answer your question?

spencer411 commented 5 years ago

Okay, thanks. Will take a closer look.

davve2 commented 5 years ago

Hi @spencer411. We plan to do a new release of CanSNPer during this year that will allow multiple SNPs to define clades. The update will be released as CanSNPer2.0, hopefully during spring, but with no guarantees.

spencer411 commented 5 years ago

Okay, I've been working on this for two weeks and I am officially nowhere. I have corrected the problem and no longer get "classification none" (I originally had the ancestral and derived SNPs in the wrong order), but now I keep getting incorrect classifications every time I run a draft genome. I have tripled checked the SNP positions against the reference using multiple software (gingr and geneious), and have also checked my own mauve alignments. In all cases, I am correctly calling my SNPs in the genotype file, yet the classification that comes back from the canSNPer package are incorrect. There is nested structure as described by adrlar, and I'm not classifying with multiple SNPs as described by davve2, so I don't see why I should be having problems unless I am missing something obvious here. I have attached the accompanying files in .txt format. Note that the Sample_1_2 file should be classified as 1.2, yet for some reason is classified as 2.3 when run through the package. Any help is greatly appreciated as I'd love to get these materials out with an ms I'm submitting so others can use this system, but if I cannot get this package to work I may have to resort to custom python script (which I would like to avoid) AE017334_reference_fasta.txt B_a_snp.txt B_a_tree.txt Sample_1_2_fasta.txt

.