Closed jjacobson95 closed 7 months ago
@cshenry Waiting on example data for the function. File is linted and should pass tests now
@cshenry This should be ready to go. Please test this yourself at your earliest convenience. Happy to make any changes.
Example run:
com_media = msrecon.get_media("KBaseMedia/Complete") gmm_media = msrecon.get_media("KBaseMedia/Carbon-D-Glucose") auxo_media = msrecon.get_media("94026/Auxotrophy_media") #Pulling super annotated ecoli genome genome_ref = "77537/Eco_RAST_Prokka_BlastKOALA_PTools_DeepEC_DeepGO" #Link to AnnotationOntology code in github: annoont = AnnotationOntology.from_kbase_data(annoapi.get_annotation_ontology_events({ "input_ref" : genome_ref }),genome_ref) #Pulling ecoli model model = msrecon.get_model("151253/GCF_000005845.2.RAST.NewGMM.mdl") media = msrecon.get_media("KBaseMedia/Carbon-L-Phenylalanine") model.pkgmgr.getpkg("KBaseMediaPkg").build_package(media) #Loading transcriptome expression = pd.read_csv("ExpressionData.tsv",sep="\t") #Getting MSGapfill object msgapfill = MSGapfill( model, [msrecon.get_template(model.model.template_ref)], [], [], blacklist=[], default_target="bio1", minimum_obj=0.01 ) msgapfill.reaction_scores = msgapfill.compute_reaction_weights_from_expression_data(expression,annoont) msgapfill.gfpkgmgr.getpkg("GapfillingPkg").compute_gapfilling_penalties(reaction_scores=msgapfill.reaction_scores) msgapfill.gfpkgmgr.getpkg("GapfillingPkg").build_gapfilling_objective_function() msgapfill.run_multi_gapfill([media],target="bio1")
The code seems to be INCREDIBLY SLOW for what it's doing... taking more than 10 minutes to compute the scores... doesn't feel right to me...
I think it's all this numpy and dataframe stuff... it's just killing your performance. A nested hash really is vastly better as a datastructure
I’ll make those changes this weekend or early next week.
I have a fix for this that I'll be pushing shortly. It should reduce time to about a minute.
Function has been tested with small scale dummy data, but real example data is needed.