Closed ucassee closed 3 months ago
Hello,
1) Do you have multiple lines for the same gene in the raw_data file for each mm
? The mm
thing is pretty weird and probably can explain this discrepancy: https://instrain.readthedocs.io/en/v1.3.0/Advanced_use.html#dealing-with-mm
2) Each base can be mutated to one of 3 bases, and some of those bases are "S" and some are "N" based on the codon table. S_sites and N_sites can thus end in 0.33 or 0.66. inStrain uses the standard method of calculating pN/pS.
Best, Matt
Hi Matt,
Thanks for your reply.
Yes, there are multiple lines for the same gene in the genes_SNP_count.csv
file to represent different mismatch levels.
Which lines does instrains actually use to count SNP in the final result of IS_gene_info.tsv
file?
Yingli
That's a complicated question, as indicated here: https://instrain.readthedocs.io/en/v1.3.0/Advanced_use.html#dealing-with-mm. The raw data files aren't really meant for users to look at. Is there some reason you want to use that raw file instead of the IS_gene_info.tsv file?
Matt
No, I just want to confirm if using the gene_info.tsv file is a suitable choice. I think it is okay
Thanks
Hi developer,
Why value of SNV_N_count、SNV_S_count and pNpS_variants in output/xxx.IS_gene_info.tsv is not the same as in raw_data/genes_SNP_count.csv
What is the meaning of S_sites and N_sites. Why aren't they integers?
genes_SNP_count.csv:
IS_gene_info.tsv:
Thanks