Open TheChenSu opened 1 year ago
The confidence scores are based on a rank-summing metric and their absolute value is largely meaningless - they're used as a way to order the results from most confident to least confident. If you'd like to look at them all, they're saved to a genes by tfs combined_confidences.tsv
file in the output directory, but the values that are trimmed out of the network.tsv
file are zeros - they are network edges for which absolutely no evidence exists at all.
Generally, you need to have some criteria to choose a confidence score threshold that's based on some existing network knowledge. You could choose the confidence score threshold such that MCC or F1 is maximized, based on the recovery of the prior (or better yet, on some gold standard part of your network knowledge that you've held out of the model training).
I am currently using the 'bsubtilis_network_inference_run_script.py' as a template for my project, which involves using a prior binary network (14k Gene x 22 TFs) and a VOOM-quantile normalized gene expression dataset (14k Genes x 17 samples). I am looking to retrieve the complete list of TF - gene edges along with their associated confidence scores, ranging from 0 to 1. Currently, the edges I am getting have a minimum score of 0.45.
Could you please guide me on the specific parameters I should adjust in the script to achieve this? Also, could you recommend a suitable threshold for the confidence scores to filter out the most significant edges? I understand that this may vary depending on the nature of the data and the specific project objectives, but any general guidance or best practices would be greatly appreciated. Thank you