JustinChu / ntsm

This tools counts the number of specific k-mers within sequence data. The counts can then be compare to other counts to determine to compute the probability that sample are of the same origin to discover incongruent samples or sample swaps.
MIT License
25 stars 1 forks source link

--all flag does not give all output comparisons if using pre-built pca data during evaluation #6

Closed sjneph closed 2 weeks ago

sjneph commented 2 weeks ago

assuming $fs is an array of file results from running ntsmCount:

// --all gives all pairwise comparisons as expected ntsmEval --all -t 16 $fs > summary.tsv

// --all only gives pairwise comparisons that are determined to be from the same sample // centered=data/human_sites_center.txt and // mtx=data/human_sites_rotationMat.tsv ntsmEval --all -t 16 -n $centered -p $mtx $fs > summary.tsv

JustinChu commented 2 weeks ago

Using PCA projection will compare samples that seem similar in PCA space, so -a may output samples that are not related or the same sample.

If I'm interpreting your question correctly this is related to https://github.com/JustinChu/ntsm/issues/5

Let me know if that helps or need more clarification on how -a works with the PCA heuristic.

*Edit. I edited the readme slightly to include this information.

sjneph commented 2 weeks ago

Okay, #5 helped my understanding. I'll close out this issue, thanks.