Remap goterms with has-part

rmflight commented 1 year ago

I've taken the build_ancestors script that was done for the statistical power manuscript (and which I use regularly), and added it as a sub-command for the gocats CLI.

This way it's part of gocats itself, and only depends on installing the package.

I've compared the outputs of this to the build_ancestors script, and although things are output in different orders, the contents of the gene-term mapping are identical, as well as the term-ontology mapping.

I have not updated the package documentation. I wanted to make sure the names of CLI arguments I've chosen make sense. It was hard to figure out what the argument names for the other CLI sub-commands went with, as I've not used the CLI for gocats that much.

rmflight commented 1 year ago

OK, just to make sure I did this right. Note, I did test this and made sure it works, but before committing and pushing, I figured I would double check the code fits what we discussed.

here is my changes to main.py

def main(args):
    if args['create_subgraphs']:
        gocats.create_subgraphs(args)
    elif args['categorize_dataset']:
        gocats.categorize_dataset(args)
    elif args['remap_goterms']:
        go_database = args['<go_database>']
        goa_gaf = args['<goa_gaf>']
        ancestor_filename = args['<ancestor_filename>']
        namespace_filename = args['<namespace_filename>']
        if args['--allowed_relationships']:
            allowed_relationships = args['--allowed_relationships'].split(",")
        else:
            allowed_relationships = ["is_a", "part_of", "has_part"]
        if args['--identifier_column']:
            identifier_column = int(args['--identifier_column'])
        else:
            identifier_column = 1

        gocats.remap_goterms(go_database, goa_gaf, ancestor_filename, namespace_filename, allowed_relationships, identifier_column)

And then in gocats.py


def remap_goterms(go_database, goa_gaf, ancestor_filename, namespace_filename, allowed_relationships, identifier_column):
    """Reads in a Gene Ontology relationship file, and a Gene Annotation File (GAF), and
    follows the GOcats rules for allowed term-to-term relationships. Generates as output
    a new GAF, and a new term to ontology namespace mapping.

    :param go_database: the gene ontology dataset
    :param goa_gaf: the gene annotation file
    :param ancestor_filename: the output file containing new gene to ontology mappings
    :param namespace_filename: the output file containing the term to ontology mappings
    :param allowed_relationships: what term to term relationships will be considered (is_a,part_of,has_part) 
    :param identifier_column: which column is being used for the gene identifiers (1)
    :return: None
    :rtype: :py:obj:`None`
    """

    if type(allowed_relationships) == str:
        allowed_relationships = allowed_relationships.split(",")

    if type(identifier_column) == str:
        identifier_column = int(identifier_column)

.... continuing, and using the variables instead of args

hunter-moseley commented 1 year ago

The "if type(allowed_relationships) == str" and similar testing in the function is not needed, since the API and CLI are cleanly separated.

Besides this, it is perfect!

On Wed, Jun 14, 2023 at 4:28 PM Robert M Flight @.***> wrote:

OK, just to make sure I did this right. Note, I did test this and made sure it works, but before committing and pushing, I figured I would double check the code fits what we discussed.

here is my changes to main.py

def main(args): if args['create_subgraphs']: gocats.create_subgraphs(args) elif args['categorize_dataset']: gocats.categorize_dataset(args) elif args['remap_goterms']: go_database = args[''] goa_gaf = args[''] ancestor_filename = args[''] namespace_filename = args[''] if args['--allowed_relationships']: allowed_relationships = args['--allowed_relationships'].split(",") else: allowed_relationships = ["is_a", "part_of", "has_part"] if args['--identifier_column']: identifier_column = int(args['--identifier_column']) else: identifier_column = 1
    gocats.remap_goterms(go_database, goa_gaf, ancestor_filename, namespace_filename, allowed_relationships, identifier_column)
And then in gocats.py

def remap_goterms(go_database, goa_gaf, ancestor_filename, namespace_filename, allowed_relationships, identifier_column): """Reads in a Gene Ontology relationship file, and a Gene Annotation File (GAF), and follows the GOcats rules for allowed term-to-term relationships. Generates as output a new GAF, and a new term to ontology namespace mapping. :param go_database: the gene ontology dataset :param goa_gaf: the gene annotation file :param ancestor_filename: the output file containing new gene to ontology mappings :param namespace_filename: the output file containing the term to ontology mappings :param allowed_relationships: what term to term relationships will be considered (is_a,part_of,has_part) :param identifier_column: which column is being used for the gene identifiers (1) :return: None :rtype: :py:obj:None """
if type(allowed_relationships) == str:
    allowed_relationships = allowed_relationships.split(",")

if type(identifier_column) == str:
    identifier_column = int(identifier_column)
.... continuing, and using the variables instead of args

— Reply to this email directly, view it on GitHub https://github.com/MoseleyBioinformaticsLab/GOcats/pull/23#issuecomment-1591930710, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEP7B3MJTGK7W4LPO7T3MDXLINFJANCNFSM6AAAAAAY4OHIXY . You are receiving this because your review was requested.Message ID: @.***>

-- Hunter Moseley, Ph.D. -- Univ. of Kentucky Professor, Dept. of Molec. & Cell. Biochemistry / Markey Cancer Center / Institute for Biomedical Informatics / UK Superfund Research Center Not just a scientist, but a fencer as well. My foil is sharp, but my mind sharper still.

Email: @. (work) @. (personal) Phone: 859-218-2964 (office) 859-218-2965 (lab) 859-257-7715 (fax) Web: http://bioinformatics.cesb.uky.edu/ Address: CC434 Roach Building, 800 Rose Street, Lexington, KY 40536-0093

rmflight commented 1 year ago

I think it's done. I've tested it all, and everything seems to be working, and the docs are all up to date.

MoseleyBioinformaticsLab / GOcats

Remap goterms with has-part #23

-- Hunter Moseley, Ph.D. -- Univ. of Kentucky Professor, Dept. of Molec. & Cell. Biochemistry / Markey Cancer Center / Institute for Biomedical Informatics / UK Superfund Research Center Not just a scientist, but a fencer as well. My foil is sharp, but my mind sharper still.