FOI-Bioinformatics / CanSNPer2

CanSNPer2: A toolkit for SNP-typing bacterial genomes.
GNU General Public License v3.0
6 stars 2 forks source link

summary file: missing results #24

Open Gilles179 opened 4 years ago

Gilles179 commented 4 years ago

As part of new scheme set-up, it is convenient to test a database on multiple assemblies, and to use the --summary option in combination with the -o option to get a list of assignments. However, when analysing a few hundred assemblies, I get a summary file with 53 to 57 lines instead of the expected number (whereas I get the whole list on the screen)

CarolineOhrman commented 4 years ago

I have also experienced that the summary file not catches all assemblies. Out of my 283 genomes only 117 ended up in the summary file.

As it is now, the summary function is implemented to only include assemblies where a Final snp call was made.

I would instead like to have all genomes in the summaryfile. If I run 283 genomes all of them should be in the summary even if the final call is NA. In the summary i propose to have 3 columns; ID, final_snp and snp_path. Then the results are easy to cut and paste with other data. Se example below. Then its easy to see also the final snps that ended up as NA and investigate more whats the issue.

ID            final_snp  snp_path
NIH_B_38      A.II.2     T/N.1;T.1;A/M.1;A.1;A.II.1;A.II.2
WY_00W4114    A.II.4     T/N.1;T.1;A/M.1;A.1;A.II.1;A.II.2;A.II.6;A.II.3;A.II.4
WY96          A.II.4     T/N.1;T.1;A/M.1;A.1;A.II.1;A.II.2;A.II.6;A.II.3;A.II.4
O_HARA        NA         T/N.1;T.1;B.1;B.16;B.218;B.219;A.II.3;A.II.4

I also propose to change snp_summary.txt to summary_snp.txt to match summary_tree.pdf

Gilles179 commented 4 years ago

I fully agree, would be great. Regarding what is currently included in the summary file, in a run with 400+ genomes, only 50+ end in the summary, even if I have only about 10 genomes classified as NA.