Google drive resources - Githubissues

arnoldchang1 commented 1 year ago

Hello, thank you for posting the code for your paper. I am trying to validate the results on our end. Is it possible to share the google drive resources?

arnoldchang1 commented 1 year ago

In addition, we were wondering what specific combination of samples were taken from each of the lung, kidney & liver datasets to obtain the 4 experimental settings? For instance, the paper states that "Specifically, on the kidney tissue, there are 4, 2, 6 and 8 experiments conducted and evaluated in Setting 1, 2, 3 and 4 respectively..." Of the 10 kidney samples, which combinations of 3 samples (2 for reference & 1 for query) taken together form settings 1 - 4?

JiayuanDing100 commented 1 year ago

Hi Arnold @arnoldchang1 ,

Thanks for your interest in our work.

For the first question, all input files needed in jupyter notebooks under the GNNDeconvolver folder can be found in https://github.com/OmicsML/SpatialCTD/tree/main/SpatialCTD_dataset.

For the question of the combination of samples under each setting, thanks for pointing this out. I put the combination of samples under each setting across the lung, liver, and kidney here: https://github.com/OmicsML/SpatialCTD/tree/main/setting

If you have any questions, please let me know.

arnoldchang1 commented 1 year ago

Thank you for providing the setting file!

I was wondering exactly how hyperparameter tuning was performed to select the best combination of learning rate and weight decay? In the final excel output, res2.xlsx, there is "min" and "select" values for jsd, mae, mse, pcc. What does "select" mean, and to obtain the final plots (for example, Figure 4), is it just the combination of learning rate + weight decay that results in the lowest jsd, mae, mse, pcc on the validation set? If so, would this imply that the optimal combination of learning rate + weight decay might differ across jsd, mae, mse, pcc?

arnoldchang1 commented 1 year ago

Also, is it possible to provide the links to the original single-cell datasets prior to gridding? Thank you.

JiayuanDing100 commented 1 year ago

Hi Arnold @arnoldchang1,

Thanks for pointing this out. I should give more clarification on the notebook for reproduction.

Yes, the res2.xlsx file is the final result where the performance of all possible hyperparameter tunings is cached.

Please use "select" values for mae, mse, and pcc for reference. Just ignore the "min" columns. "select" means to select the best model performing on validation datasets among several epochs to save as the final model.

Yes, the final plot/result is the combination of learning rate + weight decay that results in the lowest mae, mse and pcc on test datasets not validation datasets. To be specific, please refer to “[info]. selecttest-mae”， “[info]. selecttest-mse”， “[info]. selecttest-pcc ” for final plot results.

Basically, the hyperparameter tunning combination that obtains the smallest mse is the best parameter combination of mae and pcc. I have verified this in our experiments.

For the single-cell resolution CosMx datsets, you can access liver and lung here https://nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-dataset/. They are public now. For CosMx kidney dataset, unfortunately, we are not able to release it until our paper is accepted.

Please let me know if you have further questions about reproduction.

Best, Jiayuan

arnoldchang1 commented 1 year ago

Thank you very much, that is very clear! By the way, do you have the links to download the h5ad files for the datasets?

OmicsML / SpatialCTD

Google drive resources #2