PathwayAndDataAnalysis / Finkle-PHYS-479

GNU Lesser General Public License v2.1
0 stars 1 forks source link

Iterative motif search #10

Open ozgunbabur opened 2 years ago

ozgunbabur commented 2 years ago

Write an iterative method that will start by looking for enrichments and deficiencies for each location and each amino acid. Then it will

Iterate this until nothing comes out as significant. This will give you a tree of results, but you will see that some of the nodes on the tree will converge in the same motif, meaning it is actually a DAG.

Report each significant motif with their p-values, and with their parent-child relations.

AdamFinkleUMB commented 2 years ago

Tentatively done but not sure if result is what you want.

ozgunbabur commented 2 years ago

Hi Adam, please describe what you have done about this issue and please tell how we can test it.

AdamFinkleUMB commented 2 years ago

I fixed the bug we saw today: I needed to convert the number of the motif into a character. The search now neatly returns a readable result if the threshold is kept low.

ozgunbabur commented 2 years ago

What is the result on the simulated dataset with window 5?

AdamFinkleUMB commented 2 years ago

path = "test_data/simulated-phosphoproteomic-data.txt" window = 5; length = 2 * window + 1 step = 1024 threshold = 0.0005

Key: motif => [newfound_motifs] (index, letter, presence) Index is relative to 0 at the left, letter is the amino acid, and presence is whether the acid must appear (True) or absent (False)

22212 None => [(4, 'S', False), (5, 'P', True)]

18981 (4, 'S', False) => [(5, 'P', True), (9, 'I', False)]

1239 (5, 'P', True) => []

15209 (9, 'I', False) => [(4, 'H', True), (5, 'P', True), (7, 'K', True)]

440 (4, 'H', True) => []

1017 (5, 'P', True) => []

1918 (7, 'K', True) => [(5, 'P', True)]

106 (5, 'P', True) => []

1512 (5, 'P', True) => []

Final Graph: {None: (5, 'P', True)}

ozgunbabur commented 2 years ago

How should we read these? I would like to understand the resulting DAG structure.

AdamFinkleUMB commented 2 years ago

The resulting acyclic graph in this case would be the original sequences with a single edge of (5, "P", True) leading to only the sequences with a "P" at index 5. The method itself works, and understanding the DAG structure can be part of the visualization issue.

ozgunbabur commented 2 years ago

Let me give an example: The output above produces "(5, 'P', True) => []" twice in the last steps. Why is that?

How can we look at this output and draw the DAG?

Also, where does the index 5 map on the sequence? Is it the center?