QihuangZhang / CeLEry

CeLEry: cell location recovery in single-cell RNA sequencing
MIT License
26 stars 3 forks source link

Predicted coordinates are only in the center #6

Open manfred-seiwald opened 6 months ago

manfred-seiwald commented 6 months ago

Problem I used CeLEry for deconvolution of the cytospace dataset, but found that the predicted coordinates stay in the center of the reference coordinates (see image). Explanation The way how I calculated the predicted spots is:

def compute_spot_coords(self, pred_coords: np.ndarray, spot_locations: np.ndarray)->np.ndarray:
    # Row = x, Col = y
    xmin = spot_locations[:,0].min()
    xmax = spot_locations[:,0].max()
    ymin = spot_locations[:,1].min()
    ymax = spot_locations[:,1].max()
    x_spot_coords = (pred_coords[:,0] * (xmax - xmin) + xmin).astype(np.int32)
    y_spot_coords = (pred_coords[:,1] * (ymax - ymin) + ymin).astype(np.int32)
    spot_coords = np.stack([x_spot_coords, y_spot_coords], 1)
    return spot_coords

where the prediction is done by

spot_locations = spatial_rna.obs[['Row', 'Col']]
# create coordinate model from spatial_rna
celery.Fit_cord(data_train=spatial_rna, location_data=spot_locations, hidden_dims = [30, 25, 15],  num_epochs_max = 20, path = results_dir + '/temp', filename = 'coord_model')
# predict coordinates with model for each reference cell in ref_scrna, result is array (num_cells, 2), the coordinates are normalized 0..1
pred_coords = celery.Predict_cord(data_test=ref_scrna, path = results_dir + '/temp', filename = 'coord_model')

Question Could you think of a reason why most of the predicted locations are centered and not well distributed ? blue = spot_locations orange = pred_coords image

QihuangZhang commented 4 months ago

Looks like the model is very underfitted. Could you try tuning the maximum epoch to be higher (20 might be too small)? Also, you could explore other hyperparameters that are related to the modeling fitting. We add a small section in the help document to illustrate their usage:

https://github.com/QihuangZhang/CeLEry/blob/main/tutorial/tutorial.md#training-tips--tuning-matters-a-lot

manfred-seiwald commented 4 months ago

Thank you very much for answering. I will make the test in the following days and give you a response.

manfred-seiwald commented 4 months ago

I have tested several variants, here the results:

If you want I can provide you with the data, they are taken from Cytospace: the spatial file has 452 MB, the reference file has 4.4 GB.

pred_spots_lr_0 0001_batch_32