we strongly believe that mock communities are more reliable for benchmarking, as they include "known unknowns" like full effects of GC bias and sequencing adapter contamination;
we don't know of any more complex mock communities;
the paper is not about the tools for detecting strain variation, but rather about the conclusion (agreeing with other papers, most notably CAMI) that strain variation has a significant but perhaps invisible effect on real metagenome assembly;
also note that our approach for detecting strain variation in this paper is reference dependent which makes it useless for real metagenomes :(
""" The authors should design more simulated datasets to test the performance of these tools on detecting strain-level variants. """