ModelSEED / ModelSEEDpy

Python package for building and analyzing models using ModelSEED
MIT License
19 stars 17 forks source link

Created Weighting function for gap fill. Example data needed for testing function. #140

Closed jjacobson95 closed 7 months ago

jjacobson95 commented 11 months ago

Function has been tested with small scale dummy data, but real example data is needed.

jjacobson95 commented 10 months ago

@cshenry Waiting on example data for the function. File is linted and should pass tests now

jjacobson95 commented 9 months ago

@cshenry This should be ready to go. Please test this yourself at your earliest convenience. Happy to make any changes.

Example run:

com_media = msrecon.get_media("KBaseMedia/Complete")
gmm_media = msrecon.get_media("KBaseMedia/Carbon-D-Glucose")
auxo_media = msrecon.get_media("94026/Auxotrophy_media") 

#Pulling super annotated ecoli genome
genome_ref = "77537/Eco_RAST_Prokka_BlastKOALA_PTools_DeepEC_DeepGO"
#Link to AnnotationOntology code in github:
annoont = AnnotationOntology.from_kbase_data(annoapi.get_annotation_ontology_events({
    "input_ref" : genome_ref
}),genome_ref)
#Pulling ecoli model
model = msrecon.get_model("151253/GCF_000005845.2.RAST.NewGMM.mdl")
media = msrecon.get_media("KBaseMedia/Carbon-L-Phenylalanine")
model.pkgmgr.getpkg("KBaseMediaPkg").build_package(media)
#Loading transcriptome
expression = pd.read_csv("ExpressionData.tsv",sep="\t")

#Getting MSGapfill object
msgapfill = MSGapfill(
    model,
    [msrecon.get_template(model.model.template_ref)],
    [],
    [],
    blacklist=[],
    default_target="bio1",
    minimum_obj=0.01
)

msgapfill.reaction_scores = msgapfill.compute_reaction_weights_from_expression_data(expression,annoont)
msgapfill.gfpkgmgr.getpkg("GapfillingPkg").compute_gapfilling_penalties(reaction_scores=msgapfill.reaction_scores)
msgapfill.gfpkgmgr.getpkg("GapfillingPkg").build_gapfilling_objective_function()

msgapfill.run_multi_gapfill([media],target="bio1")
            
cshenry commented 7 months ago

The code seems to be INCREDIBLY SLOW for what it's doing... taking more than 10 minutes to compute the scores... doesn't feel right to me...

cshenry commented 7 months ago

I think it's all this numpy and dataframe stuff... it's just killing your performance. A nested hash really is vastly better as a datastructure

jjacobson95 commented 7 months ago

I’ll make those changes this weekend or early next week.

jjacobson95 commented 7 months ago

I have a fix for this that I'll be pushing shortly. It should reduce time to about a minute.