Bioconductor / GenomicDataCommons

Provide R access to the NCI Genomic Data Commons portal.
http://bioconductor.github.io/GenomicDataCommons/
83 stars 23 forks source link

File caching behavior and implementation #45

Closed seandavi closed 6 years ago

seandavi commented 7 years ago

This discussion seems related to but distinct from the discussion of #40, so I am opening a new issue here. @mtmorgan pointed out that BiocFileCache fills at least part of this need.

The GDC uses UUIDs for everything, including files. They seem to serve a nice purpose for uniquely describing resources in the GDC. As such, the file UUID is an ideal key in any local cache. These UUIDs also serve to disambiguate any files with the same name, so incorporating them into a local file path is likely useful.

I would envision, then, keys that look like 7cde9495-e573-4b38-b89c-991076cf8cf8 and file paths inside the BiocFileCache that look something like 7cde9495-e573-4b38-b89c-991076cf8cf8/originalfilename.txt. The original filename is important as some functions rely on file suffixes.