bokulich-lab / q2-fondue

Functions for reproducibly Obtaining and Normalizing Data re-Used from Elsewhere
BSD 3-Clause "New" or "Revised" License
20 stars 6 forks source link

Fetching sequences fails sometimes when sra-tools are not configured properly #70

Closed misialq closed 2 years ago

misialq commented 2 years ago

Just opening this issue to have this documented in one place as it's not necessarily something we can "fix".

As @adamovanja pointed out elsewhere, sometimes fetching sequences fails with an invalid accession ... error from fasterq-dump and it seems to be happening only with the latest version of q2-fondue. After some investigation, my impression is that this is caused by the configuration of sra-tools that is performed using the vdb-config tool. So there are two scenarios:

  1. prefetch's download location is set to "current directory" (this is the default option): prefetch manages to download everything as expected but fasterq-dump fails (who knows why, I couldn't really find that out)
  2. prefetch's download location is set to "user-repository" and the repository value is set (tab "Cache"): sequences are fetched correctly (note that if the repository is not set, the download will likely fail)

This behaviour is observed on sra-tools version 2.11.0 (currently available via conda). When using the latest version of the toolkit (2.13.0) I have not observed the same issue: the downloads seemed to succeed regardless of the repository settings.

To set the repo location one can use the vdb-config in the interactive mode or just execute those two commands:

vdb-config -s "/repository/user/main/public/root=<your cache location>"
vdb-config --prefetch-to-user-repo

Proposed solution: If someone else can reproduce this, I would say we should just add a section at the beginning of the README/tutorial saying that the users should run the configuration tool after installing q2-fondue and what they need to set where. Whenever the newest toolkit version becomes available we can just upgrade and that should solve the issue.

adamovanja commented 2 years ago

thanks for looking into this @misialq. I managed to reproduce it, namely that:

I went along and updated the ReadMe with these SRA Toolkit configuration instructions in #74 (it's still WIP as I aim to verify whether local file caching must be enabled for q2fondue to run)