SchulzLab / STARE

TF analysis from epigenetic and Hi-C data
MIT License
17 stars 2 forks source link

Questions about doing STARE Analysis Using Existing ABC Score Results #7

Closed Gemma-Zhang-326 closed 9 months ago

Gemma-Zhang-326 commented 9 months ago

Dear Author,

I am deeply grateful for your development of such an outstanding tool. I have a question that might seem naive: I have already performed an ABC analysis based on my data and obtained a series of result files from this analysis. I noticed that using -r existing ABC-score file can satisfy my need. However, I am confused about how to interpret the parameter named intergenicScore mentioned in the tutorial. Specifically, I am unsure which columns in the example files, named 'signalValue', 'Contact', 'adjustedActivity', 'scaledContact', 'intergenicScore', correspond to which columns in the ABC result files.

Additionally, in my case, what type of file should I specify as the '-b region_file' input? Is it suitable to use the Neighborhoods/EnhancerList.txt from the ABC result files and select columns named 'ATAC.RPKM' or 'activity_base' as the activity column?

I look forward to your assistance and kind reply.

Best, Gemma

DennisHeck commented 9 months ago

Dear Gemma,

I am very happy to hear that you find the tool useful! The 'intergenicScore' column is admittedly not very intuitive. It is used during the summarisation of the TF affinities per gene and depends on how the ABC-scoring was done. If the ABC-scoring was run with the adapted activity (-q True), then the 'intergenicScore' in the ABC result file is the same as the column 'adaptedActivity'. However, if it was run without the adapted activity (-q False), then the 'intergenicScore' is the activity of a region multiplied with the contact divided by the maximum contact found. The equations you can find here, it's the part in the 'otherwise' condition in the equations under 'ABC-scoring' and 'Not-adapted ABC-scoring' respectively. Besides its usage in the TF affinity summarisation I don't think the 'intergenicScore' is of any use, it doesn't really have a meaning on its own.

Specifically, I am unsure which columns in the example files, named 'signalValue', 'Contact', 'adjustedActivity', 'scaledContact', 'intergenicScore', correspond to which columns in the ABC result files.

The 'adjustedActivity' column from the documentation should actually be 'adaptedActvitiy'. That's from an older version where I missed updating it in the documentation, I'll fix that. Other than that I'm not sure if I fully understand the question. The columns from the test files should be the same as shown in this table here. Or which example files are you referring to?

Additionally, in my case, what type of file should I specify as the '-b region_file' input? Is it suitable to use the Neighborhoods/EnhancerList.txt from the ABC result files and select columns named 'ATAC.RPKM' or 'activity_base' as the activity column?

The region_file should be the same that you used for the ABC-scoring, so that the TF affinities can be later mapped to the interactions in the -r ABC-files. And the activity column should also be the same as given during the ABC-scoring. When you give existing ABC-files with -r, STARE will search for file names that match the column names in the bed-file (-b), or the indices if there was no header.

I hope that helps, let me know if anything is unclear.

Best wishes, Dennis

Gemma-Zhang-326 commented 9 months ago

Dear Dennis,

Thank you very much for your prompt and detailed response! I realize now that there was a mix-up in my understanding. The test file I should refer to, especially in the context of using -r existing ABC-score file, is Test/Test_Data/example_ABCpp_scoredInteractions_c4.txt.gz. This file, used in your Test_V11, follows a format like this:

#chr    Peak_Start  Peak_End    Ensembl ID  Gene Name   PeakID  signalValue Contact adjustedActivity    scaledContact   intergenicScore TSS-dist    ABC-Score
21  1200    1600    Gene6   OR4F9   21:1200-1600    269.000000  62.102081   99.667157   100.000000  99.667157   88400   0.730211
21  500 550 Gene6   OR4F9   21:500-550  50.000000   62.102081   18.525494   100.000000  18.525494   89450   0.135727
21  5000    5500    Gene6   OR4F9   21:5000-5500    20.000000   15.102081   1.304082    24.318156   1.304082    84500   0.002323
21  7600    7900    Gene6   OR4F9   21:7600-7900    123.000000  15.102081   8.020103    24.318156   8.020103    82100   0.014289

Having my own ABC results, which have been adapted based on the specific needs of my research, I aim to manually sort these results in a way that STARE can recognize, aligning with the format demonstrated in example_ABCpp_scoredInteractions_c4.txt.gz. However, I still don't fully understand how to accomplish this goal.

Thanks to your kind and thorough explanation, I believe I now have a clear understanding of the activity column in the input bed files.

Best, Gemma

DennisHeck commented 9 months ago

Dear Gemma,

In your own file you'd need the following columns:

You don't have to give a specific order of the columns. As long as you have a header that contains labels for these columns, STARE should find them.

Let me know if it doesn't work or if there are questions.

Best wishes, Dennis

Gemma-Zhang-326 commented 9 months ago

Dear Dennis,

Thank you for your patience and meticulous explanation regarding the parameters. I also plan to use the adapted ABC-score method mentioned in your article for comparison purposes. Again, I appreciate your assistance!

Yours, Gemma