KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

Small test datasets to sanity check/attempt to break code #152

Open chasemc opened 3 years ago

chasemc commented 3 years ago

We have some synthetic and simulated datasets. It would be nice to have some additional ultra-small test datasets to sanity-check code. e.g. metagenome.fna files other (not-comprehensive) examples:

evanroyrees commented 3 years ago

This is similar to #86. Maybe worth having all in one place

chasemc commented 3 years ago

On this theme, it would be nice to have a coupled small metagenome.fna and as-minimal-as-possible .dmnd for development and testing purposes where the pipeline would finish much faster.

chasemc commented 3 years ago

Duplicate of #86

chasemc commented 3 years ago

On this theme, it would be nice to have a coupled small metagenome.fna and as-minimal-as-possible .dmnd for development and testing purposes where the pipeline would finish much faster.

Small "nr.dmnd" database that can be used for the "78.125Mbp" dataset. Contains most but not proteins from the genomes in "reference_assignments.tsv"

https://drive.google.com/file/d/1Aa9KVhJNa52GZSCIY7cFUTD_wPWoM4T_/view?usp=sharing

chasemc commented 3 years ago

Unduplicated this.

Still need tests for nextflow.

chasemc commented 2 years ago

Tagging @ajlail98 as this is basically what you were working on for the benchmarking stuff.

It's related to https://github.com/KwanLab/Autometa/issues/86 but it will but that issue contains a lot of stuff so it's probably best to make comments about test data here.

The nf-core mag test dataset might be useful (I haven't looked at it closely and I suspect it's not a minimal data set. However it could probably be used as a "complete" dataset that's run infrequently -e.g. for testing merges onto the main branch: