Open swapnil-nis opened 2 years ago
Hi, Which version of TCGAbiolinks are you using?
This data is controlled. This should not be accessible without a token. Also the data seems to be either a maf file or vcf.
I have the same issue! And I have checked the samples I want to download couple of times, seems like there is nothing wrong with the samples I chose. Still I can't figure out what is going wrong.
@Mo7tafa This is a controlled data. You need to provide token.file
parameter to GDCdownload
. Or export the Manifest and use GDC token to download controlled data.
@Mo7tafa This is a controlled data. You need to provide
token.file
parameter toGDCdownload
. Or export the Manifest and use GDC token to download controlled data.
I`ve checked my data there is no control! I also used 'access = "Open"' argument in GDCquery to make sure it's not involving controlled data !
Please could you post the query function?
On Thu, May 25, 2023, 12:11 PM Mostafa @.***> wrote:
@Mo7tafa https://github.com/Mo7tafa This is a controlled data. You need to provide token.file parameter to GDCdownload. Or export the Manifest and use GDC token to download controlled data.
I`ve checked my data there is no control! I also used 'access = "Open"' argument in GDCquery to make sure it's not involving controlled data !
— Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/512#issuecomment-1563167292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQ6PQYOADNX2F6V3IBZDXH6ACZANCNFSM5U4NWVAQ . You are receiving this because you commented.Message ID: @.***>
@tiagochst I am using R 4.2.3 from windows
library("TCGAbiolinks")
NewData <- GDCquery(project = "TCGA-UVM",data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts", access = "Open")
GDCdownload(NewData, method = "api")
I also tried
GDCdownload(query = NewData, method = "api", files.per.chunk = 1)
but still the same problem : Error in if (ret == 1) break : argument is of length zero
@Mo7tafa which TCGAbiolinks version ? Your code is working for me, but I am using Mac/Linux.
@tiagochst Actually I have updated the package yesterday so I am sure it is the last version! The strange part is when I tested the code on other computer(still windows, same R version, Same internet connection) it worked well but it is still not running for me! I don't know where the problem is.
I have a similar issue. Even using the example provided in the documentation for the function GDCdownload gives this error:
query <- GDCquery(
project = "TCGA-ACC",
data.category = "Copy Number Variation",
data.type = "Copy Number Segment"
)
GDCdownload(query, files.per.chunk = 1)
Output:
Downloading data for project TCGA-ACC
GDCdownload will download 180 files. A total of 6.3478 MB
Downloading chunk 1 of 180 (1 files, size = 42.731 KB) as Fri_Jun__2_16_22_52_2023_0.tar.gz
|====================================================================================================| 100%
Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 180 (1 files, size = 42.731 KB) as Fri_Jun__2_16_22_52_2023_0.tar.gz
|====================================================================================================| 100%
Download completed
Error in if (ret == 1) break : argument is of length zero
I also get this error when trying to download open access gene expresison quantification data from another cohort.
I have tried both the github development version and the current release on bioconductor (v2.28.2).
@jowkar If you set GDCdownload(query, files.per.chunk = 2)
does it work ?
@jowkar If you set
GDCdownload(query, files.per.chunk = 2)
does it work ?
No, I get the same error. I've tried various values for this parameter, including 5 and 100 as well, besides also the default value. This is the output for files.per.chunk = 2:
Downloading data for project TCGA-ACC
GDCdownload will download 180 files. A total of 6.3478 MB
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB Download completed
Error in if (ret == 1) break : argument is of length zero
@jowkar If you set
GDCdownload(query, files.per.chunk = 2)
does it work ?No, I get the same error. I've tried various values for this parameter, including 5 and 100 as well, besides also the default value. This is the output for files.per.chunk = 2:
Downloading data for project TCGA-ACC GDCdownload will download 180 files. A total of 6.3478 MB Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz Downloading: 15 kB Download completed At least one of the chunks download was not correct. We will retry Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz Downloading: 15 kB Download completed Error in if (ret == 1) break : argument is of length zero
I have come to realize this error depends on 3 situations :
@jowkar If you set
GDCdownload(query, files.per.chunk = 2)
does it work ?No, I get the same error. I've tried various values for this parameter, including 5 and 100 as well, besides also the default value. This is the output for files.per.chunk = 2:
Downloading data for project TCGA-ACC GDCdownload will download 180 files. A total of 6.3478 MB Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz Downloading: 15 kB Download completed At least one of the chunks download was not correct. We will retry Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz Downloading: 15 kB Download completed Error in if (ret == 1) break : argument is of length zero
I have come to realize this error depends on 3 situations :
1. Your data must be a controlled data and if you set access argument to "Open" you will be alright 2. Your internet is unstable and your chunks are heavy so you need to set it on smaller chuncks to download like 2 or more 3. The cash of the hardware you are using is not enough! set your working directory into an empty drive and GDCdata file inside it and you will be alright
With the latest development version from GitHub, the TCGA-ACC query for CNV data works for me now. However, the actual query I wanted to run is the following, and it still has this issue on a laptop with high-speed internet connection over ethernet (ca 280 Mbps download speed) and 400GB free space. On the other hand, I also tried installing this same version of TCGAbiolinks on a separate server, where the query successfully downloaded the data (attempts with a previous version of the package did not work on this server, however). So while I managed to get the data on that other system, I still think there is some bug here. Note that I both provide the parameter access = "open" and set files.per.chunk = 473, which is the total size of the cohort, and which results in only one chunk being downloaded (only 2GB) but which still fails.
query <- GDCquery(
project = "TCGA-SKCM",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
access = "open"
)
GDCdownload(query, method = "api",files.per.chunk = 473)
Downloading data for project TCGA-SKCM
GDCdownload will download 473 files. A total of 2.002491876 GB
Downloading chunk 1 of 1 (473 files, size = 2.002491876 GB) as Thu_Jun__8_13_46_20_2023_0.tar.gz
Downloading: 480 MB Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 1 (473 files, size = 2.002491876 GB) as Thu_Jun__8_13_46_20_2023_0.tar.gz
Downloading: 480 MB Download completed
Error in if (ret == 1) break : argument is of length zero
Iam Also having same issues with MMRF-COMMPASS Data.
MM <- GDCquery(
project = "MMRF-COMMPASS",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "STAR - Counts",
access = "open"
)
GDCdownload(MM, method = "api", files.per.chunk = 1)
data <- GDCprepare(MM)
The error is appearing , when the 3rd iteration of download starts
Downloading data for project MMRF-COMMPASS
GDCdownload will download 859 files. A total of 3.625715802 GB
Downloading chunk 1 of 859 (1 files, size = 4.223307 MB) as Tue_Sep__5_10_16_36_2023_0.tar.gz
|============================================================================================================================================| 100%
Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 859 (1 files, size = 4.223307 MB) as Tue_Sep__5_10_16_36_2023_0.tar.gz
|============================================================================================================================================| 100%
Download completed
Error in if (ret == 1) break : argument is of length zero
any pointer to resolve the error ?
I am trying to download SNV data of TCGA-HNSC from last two months. But, I am not able to do so, and every time I used to get the same error as appended below:
gzip: stdin: not in gzip format /bin/tar: Child returned status 1 /bin/tar: Error is not recoverable: exiting now Download completed At least one of the chunks download was not correct. We will retry Downloading chunk 1 of 3 (1970 files, size = 1.002520368 GB) as Mon_May__2_21_37_55_2022_0.tar.gz |======================================================================| 100% /bin/tar: This does not look like a tar archive
gzip: stdin: not in gzip format /bin/tar: Child returned status 1 /bin/tar: Error is not recoverable: exiting now Download completed Error in if (ret == 1) break : argument is of length zero