Inconsistent results of MIP calculation and community minimum growth requirement

EmmettPeng commented 2 years ago

Hi Daniel,

I have been doing my research with SMETANA as an important tool for deciding the potential metabolic cooperation between species. From Zelezniak's paper, I learned the MIP value is decided by "community minimum growth requirement". This minimum requirement is also important for my work. So I did some modifications to your smetana code and make the metabolites manifest in the output.

In this previous issue page https://github.com/cdanielmachado/smetana/issues/10 you have mentioned that MIP value might be different for the same input. In fact, the "minimum requirement" also changed (as I observed), and that explains the change of MIP. But it does not make sense because when given a community with several GSMs, the metabolites should directly decide the minimum requirement and MIP. However, they become inconsistent between different runs.

Your explanation would be greatly appreciated if you have any! Thank you!

Emmett

franciscozorrilla commented 1 month ago

Hi Emmett, I am not the developer but I use SMETANA quite a bit as well. Regarding the inconsistencies that you are seeing, this likely has to do with the fact that there might be multiple alternative solutions to the minimal medium calculation in the non-interacting community, which would then affect the final MIP number. When you compare the metabolites across your outputs, are the results very different or is it small differences? Also, would be cool to see the modifications you made to SMETANA, are the available somewhere on github?

EmmettPeng commented 1 month ago

Hi Francisco,

I agree with the guess that the alternative minimal requirement caused the inconsistency of smetana global score.

And yes, I simply added some lines in interface.py in smetana to show the minimal requirement.

Function def run_global()

def run_global(comm_id, community, organisms, medium_id, excluded_mets, env, verbose, min_mol_weight, use_lp, debug):
    global_data = []
    debug_data = []

    if verbose:
        print('Running MIP for community {} on medium {}...'.format(comm_id, medium_id))

    mip, extras, metabolites = mip_score(community, environment=env, verbose=verbose,
                            min_mol_weight=min_mol_weight, use_lp=use_lp, exclude=excluded_mets)
    metabolites_str = ''
    for i in metabolites:
        metabolites_str = metabolites_str + ', ' + i
    metabolites_str = metabolites_str[2:]
    if mip is None:
        mip = 'n/a'

    if debug and extras is not None:
        mip_ni = ','.join(sorted(extras['noninteracting_medium']))
        mip_i = ','.join(sorted(extras['interacting_medium']))
        debug_data.append((comm_id, medium_id, 'mip', 'ni', mip_ni))
        debug_data.append((comm_id, medium_id, 'mip', 'i', mip_i))

    if verbose:
        print('Running MRO for community {} on medium {}...'.format(comm_id, medium_id))

    mro, extras = mro_score(community, environment=env, verbose=verbose,
                            min_mol_weight=min_mol_weight, use_lp=use_lp, exclude=excluded_mets)

    if mro is None:
        mro = 'n/a'

    if debug and extras is not None:
        comm_medium = ','.join(sorted(extras['community_medium']))
        debug_data.append((comm_id, medium_id, 'mro', 'community', comm_medium))
        for org, values in extras['individual_media'].items():
            org_medium = ','.join(sorted(values))
            debug_data.append((comm_id, medium_id, 'mro', org, org_medium))

    global_data.append((comm_id, medium_id, len(organisms), mip, mro, metabolites_str))

    return global_data, debug_data

Function def export_results()

def export_results(mode, output, data, debug_data, zeros):
    prefix = output + '_' if output else ''

    if mode == "global":

        df = pd.DataFrame(data, columns=['community', 'medium', 'size', "mip", "mro","minimal requirement"])
        df.to_csv(prefix + 'global.tsv', sep='\t', index=False)

        if len(debug_data) > 0:
            df = pd.DataFrame(debug_data, columns=['community', 'medium', 'key1', "key2", "data"])
            df.to_csv(prefix + 'debug.tsv', sep='\t', index=False)

    else:

        columns = ['community', 'medium', 'receiver', 'donor', 'compound', 'scs', 'mus', 'mps', 'smetana']
        df = pd.DataFrame(data, columns=columns)

        if not zeros:
            df = df.query('smetana > 0')

        df.to_csv(prefix + 'detailed.tsv', sep='\t', index=False)

Emmett

cdanielmachado / smetana

Inconsistent results of MIP calculation and community minimum growth requirement #34