Closed cutleraging closed 10 months ago
Dear Ronnie,
Thanks for your questions.
e.g., When the epigenomics file name is ENCFF045ZYD_upper-lobe-of-left-lung_H3K4me3-human.bed, the DNA element of H3K4me3 will consider this file's result since H3K4me3 is contained in the corresponding filename (ENCFF045ZYD_upper-lobe-of-left-lung_H3K4me3-human.bed). This allows the consideration of multiple epigenomics files for H3K4me3.
mutation_annotation_integration
sample_based
lncRNA These are some parameters of the SPT that are not maintained in the current version of SPT. e.g., sample_based results in a huge number of figures if we separately provide topography figures for each sample. I have hidden these parameters for the sake of simplicity.
Aggregated mode considers all mutations, whereas signature-based analysis considers the mutations assigned to the signature of interest. In aggregate mode, the probabilities are not taken into account.
You can download histone modifications and transcription factor binding sites files from ENCODE (in bed format) and run the SPT.
Could you please provide information about the genome annotations of ChromHMM?
I'm hoping that our manuscript will be available on BioRxiv so that all these analyses will be more clear.
You can see the updated SPT version 1.0.81.
Best wishes, Burcak
I'm closing this issue. If you have any further questions, please let me know.
Hi Burcak,
Thanks for the responses!
Can you explain a bit more about aggregated mode? What do you mean by the probabilities are not taken into account? How will this affect the results as compared to running with the probabilities? Because in the results of aggregated mode I still see signature-specific results.
To clarify in regards to the chromHMM question. As mentioned, I am interested in calculating observed / expected ratios using this genome annotation. This is what it looks like...
chr1 841999 842400
chr1 845599 846000
chr1 858599 858800
chr1 876399 877000
chr1 901199 901600
chr1 937399 937600
chr1 940399 941200
chr1 949799 950200
You can see that it is just genomic ranges, and no signal is associated with this. So I am wondering if the signal column is required for your program or if it is possible to observed / expected ratios just given genomic ranges?
Thanks, Ronnie
Hi @burcakotlu, wondering if you saw my last comment here. It seems to have closed prematurely.
Best, Ronnie
Dear Ronnie,
In aggregated mode, all mutations are considered in the topography analyses without assigning mutations to each specific mutational signature. In signature-specific mode, mutations are considered for the signatures they are assigned through probabilities. SPT provides resulting figures both for aggregated mode (considering all mutations) and for each specific mutational signature as long as this is possible.
Our occupancy analyses are designed for library files having signal columns (e.g., ENCODE ChIP-seq narrow peak bed files). Unfortunately, it won't work for chromHMM files with genomic ranges only. Or you can add/provide a 4th column (1-based) containing signal values of 1 by default.
I couldn't see this message on GitHub before, but I got an email and replied to that email on Dec 31, 2023. Now, that email has also shown up on GitHub. So, I copied my former answer here.
If you have any questions, please let me know.
Best wishes, Happy New Year! Burcak Otlu
HI @burcakotlu,
Thanks for the reply! In regards to 2, if I just put 1 for the signal, how should I interpret the output files then? Does it makes sense to do? As I've mentioned, I am interested in calculating observed / expected ratios.
Thanks, Ronnie
Dear Ronnie,
Since we don't know the signal values, I suggest providing a signal value of 1. In this way, you can only compare whether your mutations are preferably falling into these regions as compared to the simulated mutations. However, this is a suboptimal solution for using these files as we cannot provide signal differences among the regions in these files.
Best, Burcak
Dear Ronnie,
I will close this issue.
If you have any questions, please feel free to ask.
Best wishes, Burcak
Hi Burçak,
Can you please help me answer some questions about your great program...
What do these parameters do?
What is the difference between providing probabilities and running in 'aggregated' mode? I find for my data I will not get results when using the probabilities but I will when running in 'aggregated' mode. From the intermediate files, it looks like 'aggregated' mode still calculates the probabilities?
My samples are IMR90 cells, is it correct that the library you have for IMR90 data is only for
replication_time
andreplication_time_strand_bias
?I am interested in calculating observed / expected ratios using genome annotations such as ChromHMM. How can I do this with your program?
I see for the Epigenomics Occupancy analysis that an observed and simulated signal are being compared. Can you explain what this is? Is this the amount of mutations? Or does it have something to do with the signal from the epigenetic assay?
Thanks a lot! Ronnie