Open HenrikBengtsson opened 9 years ago
When it comes to annotation data, in addition to a formal organism label, should the genome assembly label becomes a first class citizen, e.g.
annotationData/organisms/Homo_sapiens/GRCh36/ annotationData/organisms/Homo_sapiens/GRCh37/ annotationData/organisms/Homo_sapiens/GRCh38/ annotationData/organisms/Mus_musculus/GRCm37/ annotationData/organisms/Mus_musculus/GRCm38/
? Then one could lookup annotation data as:
fa <- FastaReferenceFile(organism="Homo_sapiens", assembly="GRCh38")
Note that it should be allowed to have tags in assembly directory names, e.g.
annotationData/organisms/Homo_sapiens/GRCh37,hg19/
and still have the above lookup find it. It's only the GRC label that needs to be unique.
It might be that one has multiple sub alternatives, e.g.
annotationData/organisms/Homo_sapiens/GRCh37,hg19/Ensembl/71/ annotationData/organisms/Homo_sapiens/GRCh37,hg19/Ensembl/75/
Then the following request is ambigous (unless one defines some unique ordering and picks the "most recent" one:
gtf <- GtfDataFile(organism="Homo_sapiens", assembly="GRCh37") # Or equivalently gtf <- GtfDataFile(organism=organism(fa), assembly=assembly(fa)) # Or short gtf <- GtfDataFile(organism=fa)
To specify Ensembl release 71, then one could use:
gtf <- GtfDataFile(organism="Homo_sapiens", assembly="GRCh37", sub=c("Ensembl", "75")) # Or equivalently gtf <- GtfDataFile(organism=fa, sub=c("Ensembl", "75"))
When it comes to annotation data, in addition to a formal organism label, should the genome assembly label becomes a first class citizen, e.g.
? Then one could lookup annotation data as:
Note that it should be allowed to have tags in assembly directory names, e.g.
and still have the above lookup find it. It's only the GRC label that needs to be unique.
It might be that one has multiple sub alternatives, e.g.
Then the following request is ambigous (unless one defines some unique ordering and picks the "most recent" one:
To specify Ensembl release 71, then one could use: