MannLabs / structuremap

Python package for investigating the structural context of PTMs
Apache License 2.0
26 stars 7 forks source link

Clearer documentation needed for tutorial examples #16

Open kamurani opened 1 year ago

kamurani commented 1 year ago

In the tutorial.ipynb workflow, a file is loaded at data/test_files/ptm_file.csv which contains a set of sites and known PTMs associated with that site (e.g. p, ub, m etc.)

There is also a *_reg column for some of these sites, however it's not explained what this means and i'm unsure to what extent these extra columns are used in the downstream analysis.

For example, in perform_enrichment_analysis_per_protein, we supply a ptm_dict which to my understanding just tells the function which residues to use for the "random" background generation (i.e. residues STY that are not necessarily modified should be analysed to see if there is a statistical difference in structural properties compared to the known phosphorylation sites). But is the p_reg also important for enrichment analysis here? Are these the background residues...?

Thanks in advance!

ibludau commented 8 months ago

Hi, thanks for your message and sorry for the delay in my reply. The _reg columns in the ptm_file.csv specifies sites with a known regulatory function. So in case you don't want to look at all modified sites but a subgroup of known regulatory sites you can use those. For any follow-up analysis you could also use e.g. all p sites ad background and the p_reg sites as target to see specific trends for regulatory sites against the background of all p-sites. But this is not necessary for the general functions shown in the tutorial. And yes, the ptm_dict is only specifying the possible residues for a modification. The p_reg sites could be used instead of the p* sites in this analysis, but they don't have any other function. I hope this answers your questions :)