antonisdim / haystac

Code repository for the HAYSTAC pipeline
MIT License
12 stars 4 forks source link

sra-toolkit error - files failed to download #2

Closed ksavhughes closed 3 years ago

ksavhughes commented 3 years ago

Hi Evan Irving-Pease and Evangelos Dimopoulos,

When I try to run haystac sample to download R1 and R2 fastq files using an SRA accession number, I keep getting this error and the files are not downloaded:

Job 2: Download SRA files for accession SRS7890498. Waiting at most 3 seconds for missing files. MissingOutputException in line 43 of /home/.conda/envs/haystac/lib/python3.6/site-packages/haystac/workflow/rules/sra.smk: Job Missing files after 3 seconds: /mnt/Data/References/SRS7890498/SRS7890498/sra_data/PE/SRS7890498_1.fastq /mnt/Data/References/SRS7890498/SRS7890498/sra_data/PE/SRS7890498_2.fastq This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 2 completed successfully, but some output files are missing. 2 File "/home/.conda/envs/haystac/lib/python3.6/site-packages/snakemake/executors/init.py", line 583, in handle_job_success File "/home/.conda/envs/haystac/lib/python3.6/site-packages/snakemake/executors/init.py", line 259, in handle_job_success Trying to restart job 2.

If it is a problem with the latency wait time, how do I change that?

I've looked at the log files and have gotten two different messages:

  1. 2021-04-06T14:57:29 fasterq-dump.2.10.9 err: invalid accession 'SRS7890498’
  2. While trying to figure this error out, I got a different message in the log file: This sra toolkit installation has not been configured. Before continuing, please run: vdb-config --interactive. For more information, see https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/

The first message I don't understand because I know that accession number is correct and corresponds to an SRA entry. I looked into the second message and saved the configuration, but that didn't solve the problem. I'm not sure how haystac specifically wants/needs that file to be configured, so maybe I configured it incorrectly?

Also, I looked at the workflow/rules/sra.smk file, specifically at the "rule get_sra_fastq_pe" code. I was confused by the input line: os.path.expanduser("~/.ncbi/user-settings.mkfg"). I looked in my user directory for this file, but it is not there... Maybe that is the issue?

Thank you in advance for any help and let me know if you need any other information.

ekirving commented 3 years ago

Hi Karissa,

Thank you for your bug report, and sorry to hear you've had trouble running haystac

I've had a look at the problem you're experiencing and you seem to have found an edge case for our validator.

The code you supplied, SRS7890498, is actually the SRA version of the BioSample code SAMN17080935 (see https://www.ncbi.nlm.nih.gov/biosample/SAMN17080935)

As such, it doe not refer to a library or a specific sequencing run, but instead points to a sample, which can have multiple individual libraries. In this specific case, there happens to only be one run accession, which is SRR13263123 (see https://www.ncbi.nlm.nih.gov/sra?term=SRS7890498)

If you try running haystac using the SRA run accession SRR13263123 it should work.

With that said, this issue highlights two erorrs in haystac:

  1. The code that checks if the SRA accession is valid did not identify that this was the wrong type of accession; and
  2. The error message was not correct, it should have displayed fasterq-dump.2.10.9 err: invalid accession 'SRS7890498'

We'll have a look into fixing these two issues to prevent this problem recurring.

Cheers, Evan

ksavhughes commented 3 years ago

Hi Evan,

Yes, it works now! Thank you for your help! I didn't know that I needed to use the run accession, not the sample accession (honestly did not know there was a difference, but it makes sense now).

Best, Karissa

antonisdim commented 3 years ago

Hello Karissa,

I hope you are doing great !

Additional user input validation for SRA accessions has been added in v0.4.4 (please install the new version through github).

Apologies for the delay and of course please let us know if you face any new issues !

Best, Antony