cozygene / FEAST

Fast expectation maximization for microbial source tracking
Other
115 stars 60 forks source link

Count table orientation #25

Open cyklee opened 3 years ago

cyklee commented 3 years ago

I think there's some potential for confusion regarding sample vs. taxa orientation.

You mentioned "FEAST is expecting the taxa in the rows and the samples in the columns. " in issue #15. Yet the function Load_CountMatrix() requests for "The row names are the unique sample ids. The column names are the unique taxa ids."

In README:

"An m by n count matrix, where m is the number samples and n is the number of taxa. Row names are the sample ids ('SampleID'). Column names are the taxa ids." but the example table below shows the opposite orientation (which follows the same orientation as the example OTU tables).

Could you kindly clarify these conflicting statements?

Adding to what @sklasek said in #15 - running FEAST() using orientation suggested by the example files (taxa as rows) results in "Error in sprintf("Error: there are %d sample ids in common ") :", whereas taxa as column appears to work.

EDIT: I think I've solved the mystery following the example script in vignette. It appears that Load_CountMatrix() does the transposition, i.e. converts a file with taxa in row names to taxa in column names. This explains what we see in the README file. So for those of us who loads in a count table directly. I believe you should feed into FEAST() a count matrix where taxa names are in column names.

I'm leaving this open in the hope that this could be clarified in documentation, particularly for users less familiar with the m by n** annotation of matrices, as well as hoping those have similar questions can see this.

NeginValizadegan commented 3 years ago

I agree with this.