BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
287 stars 109 forks source link

Error in if (ret == 1) break : argument is of length zero while downloading SNV data #512

Open swapnil-nis opened 2 years ago

swapnil-nis commented 2 years ago

I am trying to download SNV data of TCGA-HNSC from last two months. But, I am not able to do so, and every time I used to get the same error as appended below:

query <- GDCquery( project = "TCGA-HNSC", data.category = "Simple Nucleotide Variation", data.type = "Annotated Somatic Mutation", )

GDCdownload(query) Downloading data for project TCGA-HNSC GDCdownload will download 5120 files. A total of 2.598036157 GB The total size of files is big. We will download files in chunks Downloading chunk 1 of 3 (1970 files, size = 1.002520368 GB) as Mon_May__2_21_37_55_2022_0.tar.gz |======================================================================| 100% /bin/tar: This does not look like a tar archive

gzip: stdin: not in gzip format /bin/tar: Child returned status 1 /bin/tar: Error is not recoverable: exiting now Download completed At least one of the chunks download was not correct. We will retry Downloading chunk 1 of 3 (1970 files, size = 1.002520368 GB) as Mon_May__2_21_37_55_2022_0.tar.gz |======================================================================| 100% /bin/tar: This does not look like a tar archive

gzip: stdin: not in gzip format /bin/tar: Child returned status 1 /bin/tar: Error is not recoverable: exiting now Download completed Error in if (ret == 1) break : argument is of length zero

tiagochst commented 2 years ago

Hi, Which version of TCGAbiolinks are you using?

This data is controlled. This should not be accessible without a token. Also the data seems to be either a maf file or vcf.

Screen Shot 2022-05-02 at 3 04 27 PM
Mo7tafa commented 1 year ago

I have the same issue! And I have checked the samples I want to download couple of times, seems like there is nothing wrong with the samples I chose. Still I can't figure out what is going wrong.

tiagochst commented 1 year ago

@Mo7tafa This is a controlled data. You need to provide token.file parameter to GDCdownload. Or export the Manifest and use GDC token to download controlled data.

Mo7tafa commented 1 year ago

@Mo7tafa This is a controlled data. You need to provide token.file parameter to GDCdownload. Or export the Manifest and use GDC token to download controlled data.

I`ve checked my data there is no control! I also used 'access = "Open"' argument in GDCquery to make sure it's not involving controlled data !

tiagochst commented 1 year ago

Please could you post the query function?

On Thu, May 25, 2023, 12:11 PM Mostafa @.***> wrote:

@Mo7tafa https://github.com/Mo7tafa This is a controlled data. You need to provide token.file parameter to GDCdownload. Or export the Manifest and use GDC token to download controlled data.

I`ve checked my data there is no control! I also used 'access = "Open"' argument in GDCquery to make sure it's not involving controlled data !

— Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/512#issuecomment-1563167292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQ6PQYOADNX2F6V3IBZDXH6ACZANCNFSM5U4NWVAQ . You are receiving this because you commented.Message ID: @.***>

Mo7tafa commented 1 year ago

@tiagochst I am using R 4.2.3 from windows

library("TCGAbiolinks")

NewData <- GDCquery(project = "TCGA-UVM",data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts", access = "Open")

GDCdownload(NewData, method = "api")

Mo7tafa commented 1 year ago

I also tried

GDCdownload(query = NewData, method = "api", files.per.chunk = 1)

but still the same problem : Error in if (ret == 1) break : argument is of length zero

tiagochst commented 1 year ago

@Mo7tafa which TCGAbiolinks version ? Your code is working for me, but I am using Mac/Linux.

Mo7tafa commented 1 year ago

@tiagochst Actually I have updated the package yesterday so I am sure it is the last version! The strange part is when I tested the code on other computer(still windows, same R version, Same internet connection) it worked well but it is still not running for me! I don't know where the problem is.

Mo7tafa commented 1 year ago

image

jowkar commented 1 year ago

I have a similar issue. Even using the example provided in the documentation for the function GDCdownload gives this error:

query <- GDCquery(
  project = "TCGA-ACC",
  data.category = "Copy Number Variation",
  data.type = "Copy Number Segment"
)

GDCdownload(query, files.per.chunk = 1)

Output:

Downloading data for project TCGA-ACC
GDCdownload will download 180 files. A total of 6.3478 MB
Downloading chunk 1 of 180 (1 files, size = 42.731 KB) as Fri_Jun__2_16_22_52_2023_0.tar.gz
  |====================================================================================================| 100%
Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 180 (1 files, size = 42.731 KB) as Fri_Jun__2_16_22_52_2023_0.tar.gz
  |====================================================================================================| 100%
Download completed
Error in if (ret == 1) break : argument is of length zero

I also get this error when trying to download open access gene expresison quantification data from another cohort.

I have tried both the github development version and the current release on bioconductor (v2.28.2).

tiagochst commented 1 year ago

@jowkar If you set GDCdownload(query, files.per.chunk = 2) does it work ?

jowkar commented 1 year ago

@jowkar If you set GDCdownload(query, files.per.chunk = 2) does it work ?

No, I get the same error. I've tried various values for this parameter, including 5 and 100 as well, besides also the default value. This is the output for files.per.chunk = 2:

Downloading data for project TCGA-ACC
GDCdownload will download 180 files. A total of 6.3478 MB
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
Error in if (ret == 1) break : argument is of length zero
Mo7tafa commented 1 year ago

@jowkar If you set GDCdownload(query, files.per.chunk = 2) does it work ?

No, I get the same error. I've tried various values for this parameter, including 5 and 100 as well, besides also the default value. This is the output for files.per.chunk = 2:

Downloading data for project TCGA-ACC
GDCdownload will download 180 files. A total of 6.3478 MB
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
Error in if (ret == 1) break : argument is of length zero

I have come to realize this error depends on 3 situations :

  1. Your data must be a controlled data and if you set access argument to "Open" you will be alright
  2. Your internet is unstable and your chunks are heavy so you need to set it on smaller chuncks to download like 2 or more
  3. The cash of the hardware you are using is not enough! set your working directory into an empty drive and GDCdata file inside it and you will be alright
jowkar commented 1 year ago

@jowkar If you set GDCdownload(query, files.per.chunk = 2) does it work ?

No, I get the same error. I've tried various values for this parameter, including 5 and 100 as well, besides also the default value. This is the output for files.per.chunk = 2:

Downloading data for project TCGA-ACC
GDCdownload will download 180 files. A total of 6.3478 MB
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
Error in if (ret == 1) break : argument is of length zero

I have come to realize this error depends on 3 situations :

1. Your data must be a controlled data and if you set access argument to "Open" you will be alright

2. Your internet is unstable and your chunks are heavy so you need to set it on smaller chuncks to download like 2 or more

3. The cash of the hardware you are using is not enough! set your working directory into an empty drive and GDCdata file inside it and you will be alright

With the latest development version from GitHub, the TCGA-ACC query for CNV data works for me now. However, the actual query I wanted to run is the following, and it still has this issue on a laptop with high-speed internet connection over ethernet (ca 280 Mbps download speed) and 400GB free space. On the other hand, I also tried installing this same version of TCGAbiolinks on a separate server, where the query successfully downloaded the data (attempts with a previous version of the package did not work on this server, however). So while I managed to get the data on that other system, I still think there is some bug here. Note that I both provide the parameter access = "open" and set files.per.chunk = 473, which is the total size of the cohort, and which results in only one chunk being downloaded (only 2GB) but which still fails.

query <- GDCquery(
  project = "TCGA-SKCM", 
  data.category = "Transcriptome Profiling",
  data.type = "Gene Expression Quantification",
  access = "open"
)

GDCdownload(query, method = "api",files.per.chunk = 473)

Downloading data for project TCGA-SKCM
GDCdownload will download 473 files. A total of 2.002491876 GB
Downloading chunk 1 of 1 (473 files, size = 2.002491876 GB) as Thu_Jun__8_13_46_20_2023_0.tar.gz
Downloading: 480 MB     Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 1 (473 files, size = 2.002491876 GB) as Thu_Jun__8_13_46_20_2023_0.tar.gz
Downloading: 480 MB     Download completed
Error in if (ret == 1) break : argument is of length zero
ankushs0128 commented 10 months ago

Iam Also having same issues with MMRF-COMMPASS Data.

MM <- GDCquery(
  project = "MMRF-COMMPASS",

    data.category = "Transcriptome Profiling",
    data.type = "Gene Expression Quantification",
    workflow.type = "STAR - Counts",
   access = "open"
  )
GDCdownload(MM, method = "api", files.per.chunk = 1)
data <- GDCprepare(MM)

The error is appearing , when the 3rd iteration of download starts

Downloading data for project MMRF-COMMPASS
GDCdownload will download 859 files. A total of 3.625715802 GB
Downloading chunk 1 of 859 (1 files, size = 4.223307 MB) as Tue_Sep__5_10_16_36_2023_0.tar.gz
  |============================================================================================================================================| 100%
Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 859 (1 files, size = 4.223307 MB) as Tue_Sep__5_10_16_36_2023_0.tar.gz
  |============================================================================================================================================| 100%
Download completed
Error in if (ret == 1) break : argument is of length zero

any pointer to resolve the error ?