Acare / hacksig

A Tidy Framework to Hack Gene Expression Signatures
https://acare.github.io/hacksig/
Other
19 stars 4 forks source link

Trouble Replicating Results from Original Paper #1

Open niuniueiko opened 2 months ago

niuniueiko commented 2 months ago

Hi Andrea,

First of all, thank you for developing this amazing tool for CINSARC analysis! I am using this package to test CINSARC as a tool for sarcoma prognosis, and therefore I want to validate its efficacy by replicating the results from the original paper (Chibon et al., 2010). However, there is no algorithm or code provided that details how the nearest centroid method was applied. Thus, your package has become a lifesaver, especially for someone with a background in biology.

I preprocessed the data by normalizing it using the GCRMA method, as stated in the original paper. I then annotate Affymetrix probe sets and summarized their expression into gene-levels (using limma::avereps()). I have provided the necessary arguments (a normalized expression matrix and a vector containing metastasis details for each sample) as required by your package and successfully run the hack_cinsarc() function. However, the cinsarc prognosis result does not agree with the published result (with only a 58% similarity), which was obtained from the phenodata in the GEO expression matrix. I am wondering which step I might have gone wrong. Is there a separate training and validation step required?

Thank you in advance!

Best regards, EZ

Acare commented 1 month ago

Hi, thanks for using the package.

Yes, I think that with the hack_cinsarc() function you can't replicate the results of the original paper because it's based on a training-validation procedure.

The hack_cinsarc() function implements the LOOCV method from this more recent paper.

You can run ?hack_cinsarc (or go here) for further info.

Thanks,

Andrea