Open mtmorgan opened 7 years ago
I agree, that's a very elegant solution. I'll check if I could use a similar implementation in ensembldb
.
OK, should be easy to re-use the concept from Organism.dplyr
in ensembldb
, actually I could just import all of the filters I need.
Two things however:
AnnotationDbi
? We could run into circularity problems/imports if ensembldb
imports Organism.dplyr
and eventually Organism.dplyr
having to import ensembldb
's EnsDb
class later.Tx_idFilter
I don't really like the mixture of CamelCase and snake_case. What about naming the filter e.g. TxidFilter
and having a package-internal mapping from the field name extracted from the filter object name (txid
) to the database column name (tx_id
)?I agree that they should be upstream. AnnotationDbi seems like a very heavy package. Is it sensible to introduce a Filters or AnnotationFilters package?
I also don't like the snake_Camel notation; probably it is staying too close to the original (TxDb) schema.
And any hope @lawremi of using S4Vectors Filter stuff? My reservations are that it seems a little heavy for the current use, and I have sometimes found myself in a place (sorry for the vagueness) where I could not easily implement my own filter (something about evaluation environments?) and having to find a multi-year old email from you for direction.
An AnnotationFilters
package would be great! Might also be very helpful for users so that they have a central entry point to the filters.
I also don't like the snake_Camel notation; probably it is staying too close to the original (TxDb) schema.
Yes, would be nice to replace all _
from the database column names when generating the name of the filter object. To map them back I see two options, the heavy one that I'm currently using in ensembldb
is to have a dedicated column
method that does return the correct database column. Second option would be a more lightweight function that uses a character
vector mapping database column names to Filter object names.
Regarding the S4Vectors
FilterRules
- had only a quick glance at it and I did not see a simple way to use that in ensembldb
.
@mtmorgan I really like the idea of an AnnotationFilters
package that provides BasicFilter
and some default additional filters that could be reused in Organism.dplyr
and ensembldb
. I think now might also be the best time to start implementing the package - later there might be too much changes that have to be implemented in Organism.dplyr
. As it is now, loading of Organism.dplyr
and ensembldb
breaks functionality of both. If you want I can also contribute to that package.
@jotsetung I started a package and invited you as collaborator with admin rights github.com/Bioconductor/AnnotationFilters.
@lawremi Should we be paying attention to S4Vectors::Filter*, or is that too ambitious?
There are filter concepts in S4Vectors, ensembldb, and now here. Shouldn't we have just one? One thing that drove us to implement our own filters rather than re-using ensembldb was the ability to easily generate them programmatically, whereas these are all 'hand-crafted' in EnsemblDb.