flatironinstitute / inferelator

Task-based gene regulatory network inference using single-cell or bulk gene expression data conditioned on a prior network.
BSD 2-Clause "Simplified" License
46 stars 12 forks source link

gene has to contain regulator? #42

Closed frankligy closed 1 year ago

frankligy commented 3 years ago

Hi,

Thanks for the awesome tool!

One question I have is, do TF regulators (tf_names file) have to be in the gene list (the genes included in the expression matrix)? Imagine a situation (making up examples for illustrative purpose) where I am only interested in 1000 genes, so my expression matrix is say 1000 200 (200 is the number of samples), and my prior is 1000 700 (imagine we have 700 TF regulators). However, none of my tf regulators are in these 1000 genes, in this case, would the current version of the inferelator run properly?

I am asking that because in a real case that similar to what I described above, I got an error when running the inferelator:

 Loading expression data file expr.tsv
 No metadata provided. Creating a generic metadata
 Loaded expr.tsv:
Data loaded: InferelatorData [float64 (583, 3335), Metadata (583, 4)] Memory: 15.55 MB
Traceback (most recent call last):
  File "<input>", line 8, in <module>
  File "/Users/ligk2e/opt/anaconda3/envs/inferelator/lib/python3.6/site-packages/inferelator/tfa_workflow.py", line 119, in run
    self.startup()
  File "/Users/ligk2e/opt/anaconda3/envs/inferelator/lib/python3.6/site-packages/inferelator/workflow.py", line 798, in startup
    self.startup_run()
  File "/Users/ligk2e/opt/anaconda3/envs/inferelator/lib/python3.6/site-packages/inferelator/tfa_workflow.py", line 129, in startup_run
    self.process_priors_and_gold_standard()
  File "/Users/ligk2e/opt/anaconda3/envs/inferelator/lib/python3.6/site-packages/inferelator/workflow.py", line 851, in process_priors_and_gold_standard
    self.gold_standard)
  File "/Users/ligk2e/opt/anaconda3/envs/inferelator/lib/python3.6/site-packages/inferelator/preprocessing/priors.py", line 42, in validate_priors_gold_standard
    check.index_values_unique(gold_standard.index)
AttributeError: 'NoneType' object has no attribute 'index'

To clarify, I didn't specify golden standard file when setting file paths because I guess it is optional according to the documentation, so I wasn't sure the reason why I got this error is due to the missing golden standard file or the suspect I described at the beginning. So I really appreciate any hints here!

Thanks in advance, Frank

asistradition commented 3 years ago

It shouldn't be a problem that the TFs aren't in your expression data - provided they're connected to genes in your prior you won't have any issues.

The error you're getting is that you need a gold standard to score against or the model metrics won't be meaningful. You can override this requirement by setting a flag as follows:

worker.set_network_data_flags(use_no_gold_standard=True)

This will ignore the requirement to have something to score against, but the metrics produced at the end of the run will be meaningless. You should have encountered a clear error message instead of dying with an inexplicable traceback - I'll make sure to fix that and add it to the next release.

frankligy commented 3 years ago

thanks a lot for the prompt reply!

I added the line you suggested and it now works, but I did notice a UserWarning:

/Users/ligk2e/opt/anaconda3/envs/inferelator/lib/python3.6/site-packages/inferelator/workflow.py:278: UserWarning: Omitting prior network data is not recommended. Use at your own risk.
  warnings.warn("Omitting prior network data is not recommended. Use at your own risk.")

I suspect the line I added is just disabling the performance evaluation process (precision, recall, etc) but I still want to include my prior network in the inference ( I did have the prior file specified in my set_file_path constructor), so in this scenario, was my prior network involved in the regression?

Thanks, Frank

asistradition commented 3 years ago

Your prior will be used, yes - that message appears if you set either use_no_gold_standard or use_no_prior; I will make sure it's clearer in the next release. The messaging around these flags is a little weak because they don't get used as much.

frankligy commented 3 years ago

Much appreciated!