GuyAllard / markov_clustering

markov clustering in python
MIT License
167 stars 37 forks source link

sequence clustering #10

Closed jspmccain closed 5 years ago

jspmccain commented 5 years ago

Super excited that you've put this work into a python MCL approach! Do you know if the input can be sequences? I'm assuming you first need to do a sequence similarity matrix?

Thanks!

GuyAllard commented 5 years ago

Hi Scott, The module does not directly work directly with sequences, but instead operates on similarity matrices which you must generate using other tools. I use it for clustering DNA sequencing data, first using alignment tools to construct the similarity matrix.

Guy

jspmccain commented 5 years ago

Thanks Guy!

It would be really helpful to see how you've been going from sequence similarity matrix to the input to run_mcl! Working away on it now - but feel like I'm not doing the most efficient approach.

Right now I'm taking BLAST bit scores (the ratio of bit score to self-hit bit score) to create an all-vs-all sequence similarity matrix, and then planning on writing a numpy array, converting to a networkx graph, then to a sparse adjacency matrix.