BIMSBbioinfo / pigx_sars-cov-2

PiGx SARS-CoV-2 wastewater sequencing pipeline
GNU General Public License v3.0
18 stars 3 forks source link

Clarification Database handling in README and Documentation #160

Closed vicfabienne closed 1 year ago

vicfabienne commented 1 year ago

Hey, for the latest version it is not quite clear wether or not the databases (vep-db, kraken-db etc.) still has to be downloaded manually or not.

I see that guix automatically downloads something and I see here https://github.com/BIMSBbioinfo/pigx_sars-cov-2/blob/main/tests/setup_test_settings.yaml that it can be used for running tests. But based on the comment in the yaml: are those the full databases now? And if so which version? Or can it only be used for testing and it would not give meaningful results when used with real data?

@jonasfreimuth please clarify and then I can add it to the README and docs

jonasfreimuth commented 1 year ago

There's a distinction between the files tests/setup_test_settings.yaml and tests/settings.yaml.

Both are supposed to use the same data (i.e. reads) and only check if everything runs. The test read dataset only completes with an unrealistically low mutation coverage threshold.

tests/settings.yaml is the file used for performing tests during development, like during make distcheck. It has the necessary modified download paths with the downsampled datesets that we use for testing. I will have to write later what exactly is downsampled and how that affects results.

tests/setup_test_settings.yaml is used to check whether the pipeline has been set up correctly by the user. (See also the Quick Start section in the PiGx-Docs). When the user installs the pipeline they may not have access to the default database dir, was the idea.

About the downloads: tests/setup_test_settings.yaml (unless modified) downloads the official datasets, as specified in the default etc/settings.yaml.in file.

vicfabienne commented 1 year ago

Sorry I may was not specific enough.

I understand the test dataset stuff with the reads.

My question was about the databases.

When the pipeline is installed using guix, are the databases shiped with the package or do they still have to be installed manually?

Also - what is the current way to run a quick test example when the packages is installed over guix? Is there some inbuilt functionality for it or would the user need to download the test directory?

rekado commented 1 year ago

The test databases are not installed in the Guix package. They are unpacked in /tmp/.local/share/pigx/databases, which is destroyed once the build is complete.

jonasfreimuth commented 1 year ago

Ok so all database downloads are either done by the user or by pipeline scripts, Guix doesnt figure into that. In order to quickly run something you do actually have to download the test dir if you dont have it already. ~That isn't mentioned yet in the docs atcually, so good point ^^~ That is actually the whole point of the Quick start section. (I was in a bit of a hurry when I wrote the previous stuff)

jonasfreimuth commented 1 year ago

Closing, the last explanation should be clear enough.