Murali-group / Beeline

BEELINE: evaluation of algorithms for gene regulatory network inference
GNU General Public License v3.0
171 stars 51 forks source link

Some problem of reproducing Fig.5 results #67

Open JaneJiayiDong opened 2 years ago

JaneJiayiDong commented 2 years ago

Hello, sorry for bothering. I am facing some issues in reproducing the results of Fig.5 of the paper. I downloaded the data (BEELINE-data and Networks) from Zenodo and used the the generateExpInputs.py.

  1. I used the expression data (mESC) and the network(Non-Specific-ChIP-seq-network.csv), and set other parameter as default. The mistake is as follows:

    Traceback (most recent call last):
    File "generateExpInputs_raw.py", line 171, in <module>
    print("\n#TFs: %d, #Genes: %d, #Edges: %d, Density: %.3f" % (nTFs,nGenes,netDF.shape[0],netDF.shape[0]/((nTFs*nGenes)-nTFs)))
    ZeroDivisionError: division by zero

    I found that the Gene names in Non-Specific-ChIP-seq-network.csv are uppercase, which is different from ExpressionData.csv, so I add expr_df.index = expr_df.index.to_series().apply(lambda x:x.upper()) before expr_df.to_csv(opts.outPrefix+'-ExpressionData.csv') The result is:

    TFs: 27, #Genes: 144, #Edges: 264, Density: 0.068

  2. After looking the issues #65 , I try to reproduce the results for the hESC datasets using the STRING ground truth net, and the result is:

    TFs: 28, #Genes: 82, #Edges: 112, Density: 0.049

I need some help for these problems. Maybe there are some steps for data preprocessing while I ignore them, please give me some advice.

Thank you Best wishes Jiayi Dong

JaneJiayiDong commented 2 years ago

After my check again, I found that it is just a simple error. If the following modifications are made, we can get the same results as the Fig 5.

print("\nReading %s" % (expr_file))
expr_df = pd.read_csv(expr_file, header=0, index_col=0)
expr_df.index = expr_df.index.to_series().apply(lambda x:x.upper())
print("\nReading %s" % (gene_ordering_file))
gene_df = pd.read_csv(gene_ordering_file, header=0, index_col=0)
gene_df.index = gene_df.index.to_series().apply(lambda x:x.upper())