Closed elizabethmcd closed 1 year ago
I think this is fine for now, but usually test data is packaged with the software...what was your thinking on keeping it separate? Also I know you're working in a strict framework and maybe this is unnecessary but in case it's useful, here's an example of a larger test data set getting downloaded for CI:
https://github.com/spacegraphcats/spacegraphcats/blob/latest/Makefile https://github.com/spacegraphcats/spacegraphcats/blob/latest/.github/workflows/test.yml#L52
I'll wait to approve until I have a better understanding of your holistic approach to testing here
The nf-core pipelines suggest keeping the physical test datasets separate from the workflow so as not to bloat the repo for the workflow. I don't know if it causes more or less confusion by doing that. It would definitely make this intermediate solution for the --input
parameter easier since it would just be referring directly to the test dataset directory within the workflow repo.
My approach and motivation for testing currently is just to make sure the workflow works and data gets through successfully. I don't think the stub
profile has full functionality yet so to do this I need a mini test dataset as nf-core suggests. I could modify this pull request putting the test data directly in this repo, as then it's better contained. However, I think nf-core's reasoning for putting test data on github/S3 is when you run nextflow run Arcadia-Science/metagenomics -profile test,docker
I don't know if then it can refer to test data directly in the workflow repo still
This pull request starts the process of adding a test dataset and configuration. The subsampled test files are location in the
Arcadia-Science/test-datasets
repo at https://github.com/Arcadia-Science/test-datasets/tree/main/cheese-illumina-metagenomes. The test config uses a low amount of CPUS/memory to test the overall workflow.Currently the workflow can be tested locally with
nextflow run main.nf -profile test,docker --input ../test-datasets/cheese-illumina-metagenomes/"*_{1,2}.fq.gz" --outdir test
where the test fastqs are local because calling from a github repo requires them to be listed in a CSV and I haven't added support for thesamplesheet_csv
module yet. With this test the workflow runs on 2 subset samples in ~7 minutes:To fix in a later pull request(s):