bhklab / PharmacoGx

R package to analyze large-scale pharmacogenomic datasets.
http://pharmacodb.pmgenomics.ca
GNU General Public License v3.0
65 stars 26 forks source link

Can't download PharmacoSets and perturbation signatures #20

Open enricoferrero opened 7 years ago

enricoferrero commented 7 years ago

I'm having troubles downloading the LINCS L1000 perturbation signatures:

> l100.genetic.pertsig <- downloadPertSig("L1000_genetic")
trying URL 'https://www.pmgenomics.ca/bhklab/sites/default/files/downloads/PSets.csv'
Content type 'unknown' length 1776 bytes
==================================================
downloaded 1776 bytes

downloaded 0 bytes

Error in download.file(url, method = method, ...) : 
  cannot download all files
In addition: Warning message:
In download.file(url, method = method, ...) :
  URL 'https://www.pmgenomics.ca/bhklab/sites/default/files/downloads//L1000_genetic_signatures.RData': status was '404 Not Found'
> l100.compounds.pertsig <- downloadPertSig("L1000_compounds")
trying URL 'https://www.pmgenomics.ca/bhklab/sites/default/files/downloads/PSets.csv'
Content type 'unknown' length 1776 bytes
==================================================
downloaded 1776 bytes

downloaded 0 bytes

Error in download.file(url, method = method, ...) : 
  cannot download all files
In addition: Warning message:
In download.file(url, method = method, ...) :
  URL 'https://www.pmgenomics.ca/bhklab/sites/default/files/downloads//L1000_compounds_signatures.RData': status was '404 Not Found'

I also couldn't download the CMap and the L1000 genetic PharmacoSets (but with a different kind of error):

> cmap.pset <- downloadPSet("CMAP")
trying URL 'https://www.pmgenomics.ca/bhklab/sites/default/files/downloads/PSets.csv'
Content type 'unknown' length 1776 bytes
==================================================
downloaded 1776 bytes

Error: error reading from connection
> l100.genetic.pset <- downloadPSet("L1000_genetic")
trying URL 'https://www.pmgenomics.ca/bhklab/sites/default/files/downloads/PSets.csv'
Content type 'unknown' length 1776 bytes
==================================================
downloaded 1776 bytes

Error: error reading from connection

Finally trying to download the L1000 compounds PharmacoSet gives me this other error:

> l100.compounds.pset <- downloadPSet("L1000_compounds")
trying URL 'https://www.pmgenomics.ca/bhklab/sites/default/files/downloads/PSets.csv'
Content type 'unknown' length 1776 bytes
==================================================
downloaded 1776 bytes

Error: cannot allocate vector of size 6.5 Gb

All in all, the only dataset I could download was were the CMap perturbation signatures.

Thanks!

p-smirnov commented 7 years ago

@enricoferrero Can you check the PSets folder created in the directory you were working in? Are there files with the names of the datasets in that folder? If so, could you delete them and try to download again using downloadPSet("CMAP")?

The error reading from connection often happens when there are interrupted downloads or incomplete files in the folder, we have yet to implement a check for compete downloads (it is on our todo list!)

The last error with L1000_compounds seems to be different. Are you on a 64 bit machine? How much memory do you have installed? The L1000 dataset is quite large (~8 GB), and R is not the most memory efficient language.

enricoferrero commented 7 years ago

@p-smirnov Thanks, yes, I think I managed to solve the Error: error reading from connection by deleting the partial files and the Error: cannot allocate vector of size 6.5 Gb by using a machine with more RAM.

However I'm still getting the 404 error for downloadPertSig("L1000_genetic") and downloadPertSig("L1000_compounds").

Can you please have a look? Thanks!

p-smirnov commented 7 years ago

@enricoferrero: It seems that the L1000_compound signatures are not available on our server, I will fix and let you know.

For the genetic perturbations in the L1000, we are not sure it makes sense to apply our regression method to the over expression and knockdown experiments, especially given the multitude of off target effects encountered in using shRNA. Therefore we did not create such an object, the but L1000_genetic PharmacoSet contains all the data to possibly do so, or use in your own pipeline.

Finally, just to let you know that currently we have only integrated the Phase 1 data, where most of the compound experiments were done. We are working to add the later generated L1000 data to our data structure.

kkmkoudijs commented 7 years ago

Hi, waiting for the L1000_compounds signatures to be put on the server as well. Maybe this message will move it up the priority list; besides this it's a great package and I will cite it in my next publication :-) Keep up the good work!

KevCYang commented 4 years ago

Hi, have the L1000_compounds and L1000_genetic perturbation signatures been made available? I tried downloadPertSig("CMAP"), downloadPertSig("L1000_genetic"), and downloadPertSig("L1000_compounds"), but only the CMAP perturbation signature could be found and downloaded from source.

Thanks!

Cameron-IT commented 3 years ago

Hello, I have a similar issue. It looks like CMAP is not an available PharmacoSet. I get the following error when trying to download CMAP:

downloadPSet("CMAP")

Error in downloadPSet("CMAP") : 
  Unknown Dataset. Please use the availablePSets() function for the table of available PharamcoSets.

I get the following output (truncated) when retrieving the list of available sets:

availablePSets()

 Dataset Name             Date Created         PSet Name      version        type

1          CCLE 2020-06-24T14:39:26.588Z         CCLE_2015         2015 sensitivity
2        CTRPv2 2020-06-24T14:39:26.588Z       CTRPv2_2015         2015 sensitivity
3          FIMM 2020-06-24T14:39:26.588Z         FIMM_2016         2016 sensitivity
4          gCSI 2021-06-11T21:58:16.390Z         gCSI_2019         2019        <NA>
5          GDSC 2020-06-24T14:39:26.588Z GDSC_2020(v2-8.2) 2020(v2-8.2) sensitivity
6          GDSC 2020-06-24T14:39:26.588Z GDSC_2020(v1-8.2) 2020(v1-8.2) sensitivity
7          GRAY 2021-02-23T14:39:26.588Z         GRAY_2017         2017 sensitivity
8         NCI60 2021-08-18T16:28:45.207Z        NCI60_2021         2021 sensitivity
9         PRISM 2021-08-18T16:28:45.207Z        PRISM_2020         2020 sensitivity
10    UHNBreast 2020-06-24T14:39:26.588Z    UHNBreast_2019         2019        both

I am trying to work with CMAP perturbation signatures to get connectivity scores between pharmacologics in CMAP and my disease state data. Your assistance would be greatly appreciated.

ChristopherEeles commented 3 years ago

Hi @Cameron-IT

This was kind of resolved in #94.

We added a new parameter to availablePSets, canonical which defaults to TRUE. This filters the list of PSets down to those who have had their molecular profiles recomputed by us, and therefore are of the highest quality and therefore considered canonical.

Please remember to consult the package documentation when you encounter an issue. In this case, you could read about the parameter change in ?availablePSets.

Regarding CMAP, the download function now uses the PSet Name column from available PSets, since the Dataset Name column is not unique. This is also documented in ?downloadPSet.

See:

> availablePSets(canonical=FALSE)[, c(1, 3)]
   Dataset Name                    PSet Name
1     UHNBreast               UHNBreast_2019
2          CCLE                    CCLE_2015
3          GDSC            GDSC_2019(v1-8.0)
4     UHNBreast    UHNBreast_2019_unfiltered
5          GRAY                    GRAY_2013
6          GDSC GDSC_2020(v2-8.2)_unfiltered
7          CCLE         CCLE_2015_unfiltered
8          gCSI                    gCSI_2019
9          gCSI                     gcsi_cnv
10         FIMM                    FIMM_2016
11         GDSC            GDSC_2019(v2-8.0)
12         GRAY                    GRAY_2017
13         gCSI                    gCSI_2017
14      BeatAML                 BeatAML_2018
15         GDSC            GDSC_2020(v2-8.2)
16       CTRPv2                  CTRPv2_2015
17         GDSC            GDSC_2020(v1-8.2)
18       CTRPv2            CTRPv2-unfiltered
19         GDSC    GDSC1_unfiltered_oldarray
20         CMAP                    CMAP_2016
21         gCSI         gCSI_2018_unfiltered
22        Tavor                   Tavor_2020
23        PRISM                   PRISM_2020
24        NCI60                   NCI60_2021

Based on the above table the solution to your problem is:

CMAP <- downloadPSet("CMAP_2016")

Best, Chris

ChristopherEeles commented 3 years ago

Hi @KevCYang,

Has your problem with perturbation signature download been resolved? @p-smirnov

Best, Chris

Cameron-IT commented 3 years ago

@ChristopherEeles thank you! I'm not sure how I missed that but the problem has been solved.