Bioconductor / ExperimentHub

Client to access ExperimentHub resources
https://bioconductor.org/packages/ExperimentHub
9 stars 13 forks source link

Using R 4.0.3 seem to have issue in extracting FaFile #15

Closed cying111 closed 4 years ago

cying111 commented 4 years ago

Hi, for our package NanoporeRNASeq, we have uploaded this FaFile to experimentHub, and it worked fine with R4.0.0 and R4.0.2 in extracting FaFile using this line: genomeSequence <- query(ExperimentHub(), c("NanoporeRNA", "GRCh38", "FaFile"))

However, when using R4.0.3 to do the same thing: genomeSequence <- query(ExperimentHub(), c("NanoporeRNA", "GRCh38", "FaFile")) it seemed to produce empty record.

Does anyone know why? Thank you Ying

lshep commented 4 years ago

To Be truthful, I'm actually surprised it worked before. The query function queries the mcols of a hub object and does not normally include the dispatchclass which from your metadata file would be the only indication of explicitly "FaFile". I would suggest changing to FASTA which would still give you the information you desire.

> query(hub, c("NanoporeRNA", "GRCh38"))
ExperimentHub with 7 records
# snapshotDate(): 2020-10-02
# $dataprovider: SGNex
# $species: Homo sapiens
# $rdataclass: vector
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["EH3808"]]' 

           title                           
  EH3808 | K562_directcDNA_replicate1      
  EH3809 | K562_directcDNA_replicate4      
  EH3810 | K562_directRNA_replicate6       
  EH3811 | MCF7_directcDNA_replicate1      
  EH3812 | MCF7_directcDNA_replicate3      
  EH3813 | MCF7_directRNA_replicate4       
  EH3814 | Hs_GRCh38_chr22_1_25409234_fasta

You can see the columns that are queried with mcols

mcols(query(hub, c("NanoporeRNA", "GRCh38")))

And as mentioned it seems like changing FaFile to FASTA will give your desired result. If this is used inside of your package code, it is probably recommended to use the EH_id number for retrieval, so then you know absolutely which file you are retrieving instead of relying on queries.

> hub['EH3814']
ExperimentHub with 1 record
# snapshotDate(): 2020-10-02
# names(): EH3814
# package(): NanoporeRNASeq
# $dataprovider: SGNex
# $species: Homo sapiens
# $rdataclass: vector
# $rdatadateadded: 2020-10-02
# $title: Hs_GRCh38_chr22_1_25409234_fasta
# $description: Sequences of region chr22 1 to 25409234 in human GRCh38 DNA ...
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: FASTA
# $sourceurl: https://github.com/GoekeLab/sg-nex-data
# $sourcesize: NA
# $tags: c("ExperimentHub", "RNASeqData", "SequencingData") 
# retrieve record with 'object[["EH3814"]]' 

Cheers,