Murali-group / Beeline

BEELINE: evaluation of algorithms for gene regulatory network inference
GNU General Public License v3.0
171 stars 53 forks source link

Duplicated rows on refNetwork.csv #38

Closed joseale2310 closed 4 years ago

joseale2310 commented 4 years ago

Hi!

I am quite interested in the evaluation of different GRN inference models. However, I have a question. What is the point of having duplicated rows in the refNetwork.csv example file?

For example, the first 3 interactions are duplicated (SOX9 is also autoregulated). Sometimes there are also interactions that appear twice in different order, are the interactions in this file directional and this mean the regulate each other?

Thanks!

adyprat commented 4 years ago

Hi, We normally take the unique rows in the refNetwork.csv file in BLEval. Since duplicated rows represent the same edge, we remove duplicates while computing AUPRC/Early precision, and so on. As for why some of them are duplicated in the refNetwork.csv file: that file was automatically generated after parsing the Boolean rule for each gene/TF in the GSD model, so if a rule for gene A was A:= (B and C) or (B and D) in the model, you might see edges B->A duplicated (the other edges are C-A and D->A).

As for the second question, you are correct in thinking that the edge list represents a directed network, so if edge A -> b and B -> a appears in the refNetwork.csv, it just represents a mutual interaction. We also included auto-regulation if it is present in the ground-truth.

Hope that helps. Best, Aditya