AntonioDeFalco / SCEVAN

R package that automatically classifies the cells in the scRNA data by segregating non-malignant cells of tumor microenviroment from the malignant cells. It also infers the copy number profile of malignant cells, identifies subclonal structures and analyses the specific and shared alterations of each subpopulation.
https://www.nature.com/articles/s41467-023-36790-9
GNU General Public License v3.0
87 stars 25 forks source link

Re-running the same data produces different results? #36

Open River366 opened 1 year ago

River366 commented 1 year ago

Dear Mr. AntonioDeFalco, Last day, I just re-running one sample data, then I found it seems like the results have some different. The tsne_scRNA.png, subclone numbers and tumor-No tumor identify was same. But the DEchr gene volcano figure were all different, maybe due to the subclone order changed? First run, these volcano figures come from chr(1,4,6,7,12,14), however, in 2nd run it only have chr(4,7,14). So, I feeling some confused about that, what the meaning for those DEchr gene volcano figures?

Thanks for your help! 1st 2nd

River366 commented 1 year ago

Sorry for my trouble. In fact, in some other samples, the subclone numbers also changed. That make the results mostly different. Would you mind provide some help?

AntonioDeFalco commented 1 year ago

Strange that you get different results, I repeated the analysis several times, it never happened before. I see that the sample name is NS_19 is by chance the sample of this dataset https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE131907. I ask this in order to try to understand the problem using the same matrix.

Thanks for your interest. Regards.

River366 commented 1 year ago

Yes,the data just from GSE131907. If you have time, please try NS_02, NS_07, and NS_19 which have some different results. The other point is, does those DEchr volcano figures meaning for difference gene expression in regions of chromosomal variation? If there are overlapping regions, then the result looks a little strange(LX255B from GSE123902).

Thanks for your help! look forward to hearing from you soon. Best regards.

20220929175121

dgacquer commented 1 month ago

Any update on this ?

I am having the same issue. If I run pipelineCNA 10 times with the same data as input, I will get 3 different results randomly.

The tsne_CNA plot is always the same, but the clustering membership (dot color) differs between runs. I have tried to modify the source code by adding a set.seed(123) after library(igraph) in the subclonesTumorCells function but it did not solve the problem.

Best regards,

David

zvittorio commented 2 weeks ago

Hi everyone,

do you have updates? @dgacquer so would you say that everything besides the clustering with igraph is deterministic? @AntonioDeFalco can you confirm based on the source code?

By the way, the louvain_igraph functions have a seed argument, see here. I was not able to find where it is implemented in the source code.

Thank you to both!