Open modash opened 6 years ago
Hi, thanks for your comment! The most likely cause of that issue is that there are different numbers of columns in each dataset. The datasets can have different numbers of points along the rows, but must have the same number of features along the columns.
I would suggest first trying to load two datasets both together to try to visualize them as in the suggested method detailed in the Usage section and the example code in example.py. Right now that only demonstrates how to use clustering and visualization, but we will add an example of batch correction next!
Thank you for your reply. So if I understand correctly, number of rows have to constant but number of columns can differ. In my dataset, I have three files where each contains 19743 rows but columns in the range of 341,553,358. Is this okay? What is the file format required? I have genes as rows and cells as columns so column numbers can vary. Also, will it take row and column headers? When I have gene name as first column, then I get expect str and float error. Can you give me basic format outline as follows,
Genes Cell1 Cell2 Cell3 Cell4 Cell5 Cell6 ERG 0 1 0 0 0 1 ATBP 1 5 8 15 2 0
Is this format okay? If not give me an order of format please.
@modash I think it would be better if you have cells in the rows, and genes in the columns, so that number of genes (columns) is fixed, and number of cell (rows) varies in different files.
Hi, I am trying to run SAUCIE and am getting the following error,
Also, it showing training models as 0. If there is any sample format and data would be nice. Here is the command I have used, python SAUCIE.py --input_dir sample --output_dir sample_out --cluster