📝 Modify documentation for benchmark.rst

Sidduppal commented 3 years ago

📝 Add docs for classification and clustering-classification 📝 Add docs to run multiple results of the same community at once 🔥 Remove code to aggregate results

chasemc commented 3 years ago

I think it may be helpful to throw an example into Jupyter notebook and link to it (or https://docs.readthedocs.io/en/stable/guides/jupyter.html) Especially since the tutorial jumps around between running things on command line and python

chasemc commented 3 years ago

I think the document would benefit from a bit of rearrangement Including: Autometa Test Datasets should be combined with Downloading Test Datasets and should be in the same location as Downloading Test Datasets. Titles:

Benchmarking Datasets
- Download Datasets
- Explanation of Datasets
  - Synthetic Communities
  - Simulated Communities
  - Generating New Simulated Communities

Move the heading Example benchmarking with simulated communities to directly before the heading Benchmark clustering

Make the examples for each type of clustering clearer by presenting them in the same context :

Benchmark clustering is presented in the context of for-looping over community sizes
Technically Benchmark classification is presented in the context of for-looping over community sizes but the for-loop isn't shown
Benchmark clustering-classification is presented in the context of running only on a single community size

The example should be simple, so I would suggest presenting only with the context of running on a single sample/community size.

An "Advanced" section could discuss how to handle running on multiple samples.

Generally, the commands/code need more description. As it stands now it's still unclear what commands to run if I want to benchmark (Is following the section Benchmark clustering-classification the same as running Benchmark clustering and Benchmark classification separately?) Possibly could be done with more descriptive headings (e.g. Benchmark clustering becomes Benchmark Binning Results; Benchmark classification becomes Benchmark taxonomic assignments), etc

Aggregate results across simulated communities seems to be just data handling? If just data-handling, it makes the documentation less clear; remove or place in an "Advanced" section or similar at end.

Flags such as --output-long and --output-classification-reports aren't described/defined Inputs aren't described, e.g. how is the input file for --predictions, --reference, etc supposed to be structured?

The documentation is missing discussion of what autometa-benchmark actually does, and the results it produces.

evanroyrees commented 2 years ago

Closing this for now. Please submit a new PR from a new KwanLab/Autometa branch with any changes still necessary.

Addressed a few comments in https://github.com/KwanLab/Autometa/pull/215

KwanLab / Autometa

📝 Modify documentation for benchmark.rst #189