Micromeda / pygenprop

A python library for programmatic usage of EBI InterPro Genome Properties.
http://pygenprop.rtfd.io/
Apache License 2.0
9 stars 4 forks source link

Can pygenprop replace assign_genome_properties.pl ? #32

Closed SilasK closed 5 years ago

SilasK commented 5 years ago

If I understand your code correctly you can parse the long format output of assign_genome_properties.pl from the genome properties, but there is no script to infer the genome properties from the output of interposcan directly.

LeeBergstrand commented 5 years ago

Hi @SilasK,

That is correct. The original plan was to simply parse the long format output from assign_genome_properties.pl and this is what the code can do currently.

However, I noticed that the long form file format only contained results for lower level genome properties such as Systems and Pathways but not for all types of genome properties (e.g. Catagories) in the tree. See the diagram below:

genome properties types

Since my visualization software uses all levels of genome properties I had to write code to do my own assignments for higher level properties.

My assignment code can be found in this file: https://github.com/Micromeda/pygenprop/blob/master/pygenprop/results.py

Specifically, the following functions:

These functions could potentially be used to assign genome property results InterProScan output.

LeeBergstrand commented 5 years ago

To make assignments right from InterProScan results would need to do the following. Note: this is based on my basic understanding of the Perl code in assign_genome_properties.pl. I still need to reverse engineer it further to have a better understanding of it.

LeeBergstrand commented 5 years ago

@SilasK Prototype code is here: https://github.com/Micromeda/pygenprop/blob/assign_from_interpro_scan/prototype_assign_from_interproscan.ipynb

LeeBergstrand commented 5 years ago

Looks like there are some anomalies. I am investigating.

SilasK commented 5 years ago

Great, I will look at it. I found out that assign_genome_properties.pl uses only the annotations (Pfam, Tigrfam,.. ) and apparently not the Interpro ids.

On Tue, Dec 11, 2018, 02:46 Lee Bergstrand notifications@github.com wrote:

Looks like there are some anomalies. I am investigating.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Micromeda/pygenprop/issues/32#issuecomment-446041844, or mute the thread https://github.com/notifications/unsubscribe-auth/AHLK2umktpd_2Mb1GmYWZeBS-oYDYSLNks5u3w6GgaJpZM4ZCilz .

LeeBergstrand commented 5 years ago

@SilasK https://github.com/ebi-pf-team/genome-properties/issues/30

LeeBergstrand commented 5 years ago

@SilasK Completed in https://github.com/Micromeda/pygenprop/pull/33

LeeBergstrand commented 5 years ago

Still on the develop branch. I'm going to be working on documentation.

LeeBergstrand commented 5 years ago

Summary can be found here.

https://github.com/Micromeda/pygenprop/blob/d284f2bb26adfab2035f2eefd1c7d7f5ada07c29/pygenprop/testing/compare_assignment_to_assign_properties_perl.ipynb

There are some difference, however, these are due to assign_genome_properties.pl not working correctly.