Different results and run time of detailed analysis

vrmarcelino commented 3 years ago

Hi Daniel,

I’m running SMETANA in --detailed mode to estimate inter-species interactions, and noticed that analyses ran on the same dataset return a different number of interactions and scores. Is this expected?

Also – I am trying to estimate interactions across groups of ca. 500 genomes, the analysis has been running for 3 days and is still in the “Running SCS” stage. Would excluding inorganic compounds help with runtime? Do you have any tips on how I could speed-up or parallelise the analyses?

Many thanks!! Vanessa

cdanielmachado commented 3 years ago

Hi Vanessa,

The inconsistency between multiple runs is due to the fact that we are using a feature in CPLEX called solution pools:

https://www.ibm.com/support/knowledgecenter/SSSA5P_20.1.0/ilog.odms.cplex.help/CPLEX/UsrMan/topics/discr_optim/soln_pool/01_soln_pool_title_synopsis.html

This speeds up the computation of alternative solutions, but it uses some heuristics that can make the final result non-deterministic.

Regarding the speed, are you simulating the 500 genomes into one single 500-species community? SMETANA is not able to deal with such a large size, the maximum size we managed to simulate so far was 50 species.

My suggestion for speed up is to build randomly-sampled sub-communities, simulate them all in parallel, and merge the results in the end. For instance, you generate 1000 random communities with 10-20 species each.

vrmarcelino commented 3 years ago

Ok, that sounds reasonable. Thanks!!

cdanielmachado / smetana

Different results and run time of detailed analysis #17