Closed elmbeech closed 4 years ago
Dear Elmar Bucher,
Thanks for your appreciation and the feedback of my package!
I checked the genes in the PPI network matrix in several ways but cannot find such duplications:
duplicated
in R and there is no duplication in both matrices: which(duplicated(rownames(net13Jun12.m)))
, and the result showed integer(0)
which means none of them are identical.net13Jun12.m
and only three in net17Jan16.m
, which is much less than what you observed.So I am wondering how you actually identify such duplications, could you provide more details so that I can help you further?
Best, Weiyan
Dear Weiyan,
I see what went wrong. I am sorry for that.
I am not really an R programmer. So I downloaded the network to a tab separated file like this.
library(LandSCENT)
data(net13Jun12.m)
write.table(net13Jun12.m, "net13Jun2012.original.entrez.m.tsv", sep="\t")
And uploaded it into Python3 for mapping the entrenz gene identifier to other gene identifiers.
import pandas as pd
df_net13 = pd.read_csv("net13Jun2012.original.entrez.m.tsv", sep="\t")
Because gene identifier not always map one to one, I used the pandas command:
df_net13 = df_net13.drop_duplicates()
Now drop_duplicates removes all duplicate row, but it ignores the index. So it removed genes that are in the network the same way connected then already another gene.
I am sorry about that! I think I can close this issue. Elmar
Dear Chen Weiyan,
I deeply admire your groups research work and really appreciate that you wrote this awesome R package! (Although I would have preferred a python package, but that's a matter of taste.)
I don't know if it matters for calculation, but I realized that your ppi matrix files contain duplicate rows and columns.
I attached tab separated value files with dropped duplicated rows. Maybe they are useful. net13Jun2012.entrez.m.tsv.gz net17Jan2016.entrez.m.tsv.gz
Best, Elmar Bucher