ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
368 stars 66 forks source link

Generate distance matrix rather than pairwise #5

Closed tseemann closed 6 years ago

tseemann commented 6 years ago

Would it be possible to add an option to output a distance matrix in TSV or CSV instead of the pairwise list?

    A    B    C  
A   100  83   71 
B       100   92
C            100

It could be upper triangle, lower triangle, or both.

cjain7 commented 6 years ago

Yes, this should be easy, will add it soon. Thanks for the feedback.

tseemann commented 6 years ago

Assuming you haven't already done this, it should be output as a PHYLIP distance matrix. These can be in lower triangle form, or full matrix.

For full matrix, they need to be symetrical, so you will need to average A vs B and B vs A as fastANI is not symmetric?

For lower triangle, just use whatever is in query?

The PHYLIP format is:

4
A
B  33
C  12  99
D  25  87  8
cjain7 commented 6 years ago

Yes thanks. Miguel, who is co-author of FastANI also pointed me to PHYLIP format for this purpose. I hope to add this soon, along with multi-threaded execution feature.

tseemann commented 6 years ago

Looking forward to it ! I never managed to convince Brian at Mash to do it: https://github.com/marbl/Mash/issues/9 CC: @schultzm

cjain7 commented 6 years ago

Hi, option to output matrix is available in the latest version; please check when you get chance.

tseemann commented 6 years ago

@cjain7 is the matrix the default output now? or is there a command line option? I can't see "matrix" anywhere in the README.

fmaguire commented 6 years ago

@tseemann Looks like it is now a command line option --matrix (outputs a $output_filename.matrix file)

cjain7 commented 6 years ago

@fmaguire is right. I've added this info to README now. Thanks!

vinisalazar commented 5 years ago

Hi, is it possible to reopen this issue to include a redundant matrix? Or could anyone provide a suggestion on how to transform the lower triangular which is outputted as a redundant/upper triangular?

Thank you for any assistance you can provide.

cjain7 commented 5 years ago

@vinisalazar If you are using some scripting language like R, there should be easy ways to convert a lower triangular matrix to upper triangular. I would expect you could first create a full symmetric matrix from a lower matrix, and then set lower values to 0.

Optionally, if you are familiar with C/C++, you can modify the source code; in particular the last ten or so instructions of the function outputPhylip in file fastANI/src/cgi/include/computeCoreIdentity.hpp should do the job.

vinisalazar commented 5 years ago

If you are using some scripting language like R, there should be easy ways to convert a lower triangular matrix to upper triangular.

So I thought, but I really could not find a tested function/library to do so. Thank you for the help regardless.