KrishnaswamyLab / SAUCIE

Other
98 stars 29 forks source link

ValueError: all the input array dimensions except for the concatenation axis must match exactly #6

Open modash opened 6 years ago

modash commented 6 years ago

Hi, I am trying to run SAUCIE and am getting the following error,

Training batch correction models. Starting to train 10 batch correction models... Training model 0 SAUCIE.py:111: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead. alldata = np.concatenate([refx.as_matrix(), nonrefx.as_matrix()], axis=0) Traceback (most recent call last): File "SAUCIE.py", line 338, in train_batch_correction(rawfiles) File "SAUCIE.py", line 128, in train_batch_correction raise(ex) File "SAUCIE.py", line 111, in train_batch_correction alldata = np.concatenate([refx.as_matrix(), nonrefx.as_matrix()], axis=0) ValueError: all the input array dimensions except for the concatenation axis must match exactly

Also, it showing training models as 0. If there is any sample format and data would be nice. Here is the command I have used, python SAUCIE.py --input_dir sample --output_dir sample_out --cluster

mattamodio commented 6 years ago

Hi, thanks for your comment! The most likely cause of that issue is that there are different numbers of columns in each dataset. The datasets can have different numbers of points along the rows, but must have the same number of features along the columns.

I would suggest first trying to load two datasets both together to try to visualize them as in the suggested method detailed in the Usage section and the example code in example.py. Right now that only demonstrates how to use clustering and visualization, but we will add an example of batch correction next!

modash commented 6 years ago

Thank you for your reply. So if I understand correctly, number of rows have to constant but number of columns can differ. In my dataset, I have three files where each contains 19743 rows but columns in the range of 341,553,358. Is this okay? What is the file format required? I have genes as rows and cells as columns so column numbers can vary. Also, will it take row and column headers? When I have gene name as first column, then I get expect str and float error. Can you give me basic format outline as follows,

Genes Cell1 Cell2 Cell3 Cell4 Cell5 Cell6 ERG 0 1 0 0 0 1 ATBP 1 5 8 15 2 0

Is this format okay? If not give me an order of format please.

sameetmehta commented 4 years ago

@modash I think it would be better if you have cells in the rows, and genes in the columns, so that number of genes (columns) is fixed, and number of cell (rows) varies in different files.