flatironinstitute / inferelator

Task-based gene regulatory network inference using single-cell or bulk gene expression data conditioned on a prior network.
BSD 2-Clause "Simplified" License
47 stars 12 forks source link

reproduce network from paper #39

Closed koenvandenberge closed 3 years ago

koenvandenberge commented 3 years ago

Hi All,

Thanks for developing and maintaining inferelator. I was wondering about the code that should be used in order to (largely) reproduce the results obtained in the Jackson et al. 2020 eLife paper, using the current version of inferelator. I'm specificially referring to the 'final' network of Figure 6. From the Supplementary files provided with the paper, I've been able to figure out what, I think, should be used as expression_matrix and gold_standard_file. I was wondering if you could please point me to the files that should be used for the three remaining arguments: tf_names_file, meta_data_file, and priors_file.

Thanks!

worker = inferelator_workflow(regression="bbsr", workflow="tfa")
worker.set_file_paths(input_dir=".",
                    output_dir="./output_inferelator",
                    expression_matrix_file="103118_SS_Data.tsv.gz", #provided
                        tf_names_file="regulators.tsv", # not provided
                        meta_data_file="meta_data.tsv", # not provided
                        priors_file="priors.tsv", # not provided
                    gold_standard_file="supplementary/source1/priors/YEASTRACT_20190713_BOTH.tsv") #provided
asistradition commented 3 years ago

Good morning-

The file YEASTRACT_20190713_BOTH.tsv should be used as the priors_file for Figure 6 from the 2020 eLife paper (it is also used as the gold standard file for figure 6).

The tf_names_file is tf_names_yeastract.txt which should be included with the eLife supplemental data package. I'm attaching it here as well.

The meta data is included with the expression data file 103118_SS_Data.tsv.gz. There are 5 columns which should be automatically extracted as metadata (since they're all categorical strings instead of numeric data). If you'd like to manually remove them they're: ['Genotype', 'Genotype_Group', 'Replicate', 'Condition', 'tenXBarcode'].

In addition, Figure 6 was generated with the AMuSR multi-task model.

Please let me know if you have any other questions

koenvandenberge commented 3 years ago

Hi @asistradition

Thanks very much for the quick reply. Is it possible you forgot to attach the files mentioned? You mention they should be included with the eLife supplemental data package but I can't seem to find them.

asistradition commented 3 years ago

Sorry, github decided not to attach it. I've gone ahead and added the tf_names file to the inferelator repo itself in the data directory.

koenvandenberge commented 3 years ago

Fantastic, many thanks for the help!