SchlossLab / schtools

Schloss Lab Tools for Reproducible Microbiome Research 💩
http://www.schlosslab.org/schtools
Other
28 stars 11 forks source link

Refactor read_dist() #13

Open kelly-sovacool opened 4 years ago

kelly-sovacool commented 4 years ago

read_dist() should check whether the file provided is a lower triangular distance matrix, a square matrix, or not a valid format.

Details on phylip-formatted distance matrices: https://mothur.org/wiki/phylip-formatted_distance_matrix/#:~:text=The%20basic%20format%20of%20a,sequence%20to%20the%20other%20sequences.

Note: phyloseq::import_mothur_dist() exists, but we should have our own version because:

kelly-sovacool commented 3 years ago

@NLesniak I wonder if read_dist() should have a more specific name since it is specifically for phylip-formatted lower triangular matrices, and not for square matrices or mothur's column format? e.g. read_mtx_phylip() or something like that?

NLesniak commented 3 years ago

@kelly-sovacool do you think we should rename this and have a specific function for each or add the ability to read different input types to this function?

kelly-sovacool commented 3 years ago

@NLesniak perhaps both? For now maybe we should rename this one to something phylip-specific. If we want/need to add support for other file types, we would add those as functions following a similar naming scheme. Potentially then read_dist() could become a wrapper around those functions with a file type parameter.

kelly-sovacool commented 1 year ago

Pat's code to read mothur's lower triangular distance file as a matrix: https://github.com/riffomonas/distances/blob/f5cb11b7d8c5a900249c5e676269699411f0092a/code/read_matrix.R