Open chasemc opened 3 years ago
This is similar to #86. Maybe worth having all in one place
On this theme, it would be nice to have a coupled small metagenome.fna and as-minimal-as-possible .dmnd for development and testing purposes where the pipeline would finish much faster.
Duplicate of #86
On this theme, it would be nice to have a coupled small metagenome.fna and as-minimal-as-possible .dmnd for development and testing purposes where the pipeline would finish much faster.
Small "nr.dmnd" database that can be used for the "78.125Mbp" dataset. Contains most but not proteins from the genomes in "reference_assignments.tsv"
https://drive.google.com/file/d/1Aa9KVhJNa52GZSCIY7cFUTD_wPWoM4T_/view?usp=sharing
Unduplicated this.
Still need tests for nextflow.
Tagging @ajlail98 as this is basically what you were working on for the benchmarking stuff.
It's related to https://github.com/KwanLab/Autometa/issues/86 but it will but that issue contains a lot of stuff so it's probably best to make comments about test data here.
The nf-core mag test dataset might be useful (I haven't looked at it closely and I suspect it's not a minimal data set. However it could probably be used as a "complete" dataset that's run infrequently -e.g. for testing merges onto the main
branch:
We have some synthetic and simulated datasets. It would be nice to have some additional ultra-small test datasets to sanity-check code. e.g. metagenome.fna files other (not-comprehensive) examples: