enasequence / enaBrowserTools

A collection of scripts to assist in the retrieval of data from the ENA Browser
Apache License 2.0
74 stars 15 forks source link

Errors accessing fastq files through enaDataGet #49

Closed CharlesNunn001 closed 1 year ago

CharlesNunn001 commented 4 years ago

Hi,

A number of accessions return errors when attempting to download fastq files with enaDataGet; I've included the issue, error and an example lane ID of these three cases. enaBrowserTools version 1.5.3

  1. Lane suppression; when a lane is suppressed the enaDataGet tool cannot access it, however, it still attempts to retrieve the record and results in the error below.
    /.singularity.d/actions/exec: 21: exec: enaDataGet: not found
    Traceback (most recent call last):
    File "/opt/enaBrowserTools-1.5.5/python3/enaDataGet.py", line 99, in <module>
    readGet.download_files(accession, output_format, dest_dir, fetch_index, fetch_meta, aspera)
    File "/opt/enaBrowserTools-1.5.5/python3/readGet.py", line 121, in download_files
    if utils.is_empty_dir(target_dir):
    UnboundLocalError: local variable 'target_dir' referenced before assignment
    ERROR: Something unexpected went wrong please try again.
    If problem persists, please contact datasubs@ebi.ac.uk for assistance, with the above error details.

    A lane with this error is SRR8901447.

  2. Missing lane; if a lane accession exists but is missing downloadable files from ena it returns this error: mv: cannot stat ‘/path_download/SRR9998315/*‘: No such file or directory As this shows, an accession is SRR9998315.
  3. Unrecognised accession; if the accession is not recognised by enaDataGet it returns a clear error message stating this, however the lanes are present on ena and can be downloaded with wget.
    Checking availability of https://www.ebi.ac.uk/ena/browser/api/xml/SRR10049758
    ERROR: Invalid accession provided

    A lane with this issue is SRR10049758.

It would be appreciated if you could provide descriptive errors for the first and second issues and a solution for the third.

Thanks!

rpolicastro commented 4 years ago

@CharlesNunn001 For problem 3, I found in their code that they are checking for SRR numbers with a maximum of 7 digits. Your accession number failed because it's 8 digits. I've initiated a pull request to fix this problem.

josieburgin commented 4 years ago

Hi,

Both (1) and (3) should be fixed in the latest release. (1) was due to the different behaviour in the old browser API vs the new browser API which meant that the record validation checks were not working as intended, this is now fixed. (3) was due to the regex that validated the accessions - this was out of date and has been recently updated.

I have not been able to replicate issue (2). Could you please clarify what you mean by 'missing downloadable files'? E.g. There are downloadable files in the FASTQ ftp column of https://www.ebi.ac.uk/ena/browser/view/SRR9998315 - were these not available at the time you opened this issue?

Thanks