Created Weighting function for gap fill. Example data needed for testing function.

jjacobson95 commented 11 months ago

Function has been tested with small scale dummy data, but real example data is needed.

jjacobson95 commented 10 months ago

@cshenry Waiting on example data for the function. File is linted and should pass tests now

jjacobson95 commented 9 months ago

@cshenry This should be ready to go. Please test this yourself at your earliest convenience. Happy to make any changes.

Example run:

com_media = msrecon.get_media("KBaseMedia/Complete")
gmm_media = msrecon.get_media("KBaseMedia/Carbon-D-Glucose")
auxo_media = msrecon.get_media("94026/Auxotrophy_media") 

#Pulling super annotated ecoli genome
genome_ref = "77537/Eco_RAST_Prokka_BlastKOALA_PTools_DeepEC_DeepGO"
#Link to AnnotationOntology code in github:
annoont = AnnotationOntology.from_kbase_data(annoapi.get_annotation_ontology_events({
    "input_ref" : genome_ref
}),genome_ref)
#Pulling ecoli model
model = msrecon.get_model("151253/GCF_000005845.2.RAST.NewGMM.mdl")
media = msrecon.get_media("KBaseMedia/Carbon-L-Phenylalanine")
model.pkgmgr.getpkg("KBaseMediaPkg").build_package(media)
#Loading transcriptome
expression = pd.read_csv("ExpressionData.tsv",sep="\t")

#Getting MSGapfill object
msgapfill = MSGapfill(
    model,
    [msrecon.get_template(model.model.template_ref)],
    [],
    [],
    blacklist=[],
    default_target="bio1",
    minimum_obj=0.01
)

msgapfill.reaction_scores = msgapfill.compute_reaction_weights_from_expression_data(expression,annoont)
msgapfill.gfpkgmgr.getpkg("GapfillingPkg").compute_gapfilling_penalties(reaction_scores=msgapfill.reaction_scores)
msgapfill.gfpkgmgr.getpkg("GapfillingPkg").build_gapfilling_objective_function()

msgapfill.run_multi_gapfill([media],target="bio1")

cshenry commented 7 months ago

The code seems to be INCREDIBLY SLOW for what it's doing... taking more than 10 minutes to compute the scores... doesn't feel right to me...

cshenry commented 7 months ago

I think it's all this numpy and dataframe stuff... it's just killing your performance. A nested hash really is vastly better as a datastructure

jjacobson95 commented 7 months ago

I’ll make those changes this weekend or early next week.

jjacobson95 commented 7 months ago

I have a fix for this that I'll be pushing shortly. It should reduce time to about a minute.

ModelSEED / ModelSEEDpy

Created Weighting function for gap fill. Example data needed for testing function. #140