usage of metadata - Githubissues

JamalEH commented 4 years ago

Dear TFEA team,

Thank you so much for the great tool TFEA, which allows an easy integration of chip-seq data.

I have several questions related to the usage of metadata:

Can I restrict my analysis on a specific cell line, using chip-seq data for one specific cell line under a specific condition (e.g. vehicle)? I mean, I'm working with MCF-7 cells and I would like to identify the list of enriched TFs, as putative regulators of my up and downregulated genes.
Is it also possible to perform the analysis using the set of DE isoforms instead of genes? my aim here is to make a mapp of TF-isoform relationship and to see whether in my dataset I have a differential enrichment of TFs comparing up versus downregulated isoforms. which window in term of kbases far from an isoform shoud I use in this case?

my last question, do you take care about the genome assembly version used in different chip-seq datasets? or it is not important since you do not use the exact coordinates but you provide a contingency matrix holding information on present/absent peak over a given gene?

Your comments will be much appreciated! Thank you in advance!

Kind regards, Jamal.

LauraPS1 commented 4 years ago

Hi Jamal,

Can I restrict my analysis on a specific cell line, using chip-seq data for one specific cell line under a specific condition (e.g. vehicle)? I mean, I'm working with MCF-7 cells and I would like to identify the list of enriched TFs, as putative regulators of my up and downregulated genes.

Yes, you only need a data frame with the columns 'Accession' and 'TF' to act as filter in contingency_matrix() or GSEA_run():

data( "MetaData", package = "TFEA.ChIP" ) mcf7_index <- MetaData[ MetaData$Cell == "MCF7" , c( "Accession", "TF" ) ]

Is it also possible to perform the analysis using the set of DE isoforms instead of genes?

With our pre-made databases, not at the moment. Our current TF-gene database is based on regulatory elements defined on GeneHancer, which does not include isoform specific information.

my last question, do you take care about the genome assembly version used in different chip-seq datasets? or it is not important since you do not use the exact coordinates but you provide a contingency matrix holding information on present/absent peak over a given gene?

As long as the ChIP-seq peaks and the database used to link peaks to genes is on the same assembly version it shouldn't create any issues.

JamalEH commented 4 years ago

Dear Laura, I got it. Thank you for your prompt answers!

I like your tool :)

Kind regards, Jamal.

LauraPS1 commented 4 years ago

No problem, and thank you :)

Cheers

LauraPS1 / TFEA.ChIP_downloads

usage of metadata #2