Bioconductor / ExperimentHub

Client to access ExperimentHub resources
https://bioconductor.org/packages/ExperimentHub
9 stars 13 forks source link

ExperimentHub resource EH1039 mislabeled as "RLE-compressed" #3

Closed hpages closed 6 years ago

hpages commented 6 years ago

Not sure where to open an issue for this. If you'd prefer this to go under https://github.com/Bioconductor/ExperimentHubData or https://github.com/Bioconductor/HubServer let me know and I'll move the issue there.

> library(ExperimentHub)
> hub <- ExperimentHub()
snapshotDate(): 2018-08-03
> query(hub, "TENxBrainData")
ExperimentHub with 4 records
# snapshotDate(): 2018-08-03 
# $dataprovider: 10X Genomics
# $species: Mus musculus
# $rdataclass: character
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["EH1039"]]' 

           title                                           
  EH1039 | Brain scRNA-seq data, 'RLE-compressed'          
  EH1040 | Brain scRNA-seq data, 'rectangular'             
  EH1041 | Brain scRNA-seq data, sample (column) annotation
  EH1042 | Brain scRNA-seq data, gene (row) annotation     

This dataset uses a sparse matrix representation on top of HDF5. This is different from RLE compression. You could label it "HDF5-based sparse matrix representation" or "HDF5-based 10X Genomics format" or "HDF5-based sparse matrix" or just "sparse matrix" if horizontal space is an issue.

Also the presence of "rectangular" in the label of EH1040 doesn't really help figuring out the difference between this resource and EH1039. The 2 resources represent the Brain scRNA-seq dataset in different formats but the dataset is rectangular independently of what the format is. A better description for EH1039 would be "dense matrix" representation by opposition to EH1039 which uses a "sparse matrix" representation. The 2 representations are HDF5-based.

Thanks!

lshep commented 6 years ago

The entries in ExperimentHub are user based on what is provided in the metadata.csv file. This should be opened as an issue on the TENxBrainData. We don't determine this. It should be updated in the package and then a requested update in the Hub should be done.

mtmorgan commented 6 years ago

Actually (sheepishly, is there an emoji for that?) 'we' (i.e., me) did; can we talk about this tomorrow?

lshep commented 6 years ago

Fixed on our end -