Closed bruth closed 10 years ago
Jeremy can weigh in, but I don't think this has been a problem on the PCGC project. In practice, we haven't had many concepts that require custom formatting. Since 1) we haven't encountered cases where we have foreseen that two or more formatters might reasonably apply to a given concept, and 2) there isn't a way to dynamically upload formatters, there would be no disadvantage for us in wiring formatters to concepts in the code, since we have to modify the code and restart the web server anyway whenever we make a formatter change.
An alternative would be typing the formatters and translators to one degree or another (number or type of fields, or class of formatter) and using that type information in building the drop-downs. Of course, if you wanted to put maximum power in the hands of the user, you'd have to support uploadable formatter code ;-)
Thanks for the insight @murphyke. Out of curiosity, what power would you expect from uploadable formatter code? What restrictions would be in place to prevent someone (theoretically) hacking in and uploading arbitrary code? (I've gone against my own rule of getting theoretical).
@murphyke In thinking of an alternative approach, I am sensitive to the fact that PCGC has way more fields and concepts than any other project so far. It's seems like an extreme case, but considering there is only about 4 production Harvest apps.. it's too small of sample size to tell. Roughly how many fields and concepts are defined for PCGC? How many translators are used? How many custom formatters?
what power would you expect from uploadable formatter code? What restrictions would be in place?
Hence the smiley in my original.
How many X?
20 custom formatters. 2 translators. 60 categories. 509 fields. 422 criteria. 370 columns.
@murphyke thanks for the stats
@murphyke what is your overall assessment of the generality of the formatters? are they very specific to the criterion concept they apply to? which formatter has the most concepts associated with it? (same questions for translators)
Another aspect of defining/augmenting the concept-field relationship programmatically is the inability to represent fields that do not map to a Django model fields such as computed fields, e.g. class methods and properties that act on other data. For example, I am implementing the SIFT and PolyPhen2 formatters in Varify to include both the raw score and the prediction text, e.g. 'Damaging', 'Tolerated', etc. The prediction text is not a real column, but is computed. This clogs up the formatter a bit since the computed field needs to be defined and inserted in the formatter. The base Formatter
class may be the appropriate place to put this logic.. e.g. define additional fields or data prior to running it through the formatter methods. Just a thought.
I think offering more standardized safe formatter syntax and examples would be helpful then you wouldn't have everyone inventing their own systems for representing intracell tables using columns with '|' or row breaks with '$' and then having to come up with CSS to address width issues.
Similar to #88 in practice
Avocado has two core data types, the
Field
andConcept
.Field
s map to and provide an interface for Django model fields which map directly to database columns. Additional metadata such as a verbose name, plural name, units, description, etc. can be associated with aField
instance. The query behavior of the instance can also be customized via thetranslator
attribute.A
Concept
associates one or moreField
s together and is the public view of what data is available for query. Aformatter
can be assigned to a concept to customize how the data is formatted on the way out of the database. In the simplest case, a single field concept would just let the data pass through as is without modifying it. In more complex cases, a second query to the database could be issued to fetch some complicated related data, a local cache hit could be performed, or even an external service could be interacted with to fill in the necessary data for the concept.Both translators and formatters are Python classes. There is a base class for each type and provide a default behavior. Custom subclasses can be defined and registered with the respective class registry. For example:
Avocado currently supports defining the translator and formatter at the database level (e.g. in the Django admin). The items registered for each type (as shown above) populate the available choices that can be selected. For example, all translators registered will appear in the
translator
drop-down in the admin and likewise for the formatters.The assumption for this design was that it would be easy for a non-technical admin to manage the metadata in the admin interface (most people like a UI) and it would easy to change these attributes on-the-fly if needed.
In theory this sounds like a clever way to bridge the gap between code and data, however in practice there is a disconnect between what data is being acted on (e.g. the fields in a concept) and the implementation of the class (e.g. the formatter).
Given this formatter class:
The formatter is making two assumptions:
field_name
of'foo'
and'bar'
This makes the formatter tied to concepts of a very specific type. It is arguably not appropriate to have this formatter in the drop-down list in the admin since a user could select the formatter for a concept not compatible with the formatter. This leads me to my first question and concern.
If users managing the data in the admin are not assumed to know the underlying implementations of the formatters and translators, how can the above stated mishap be prevented?
Without very sufficient, non-technical documentation (that may lead to more confusion), I am not sure this issue can be prevented. In most cases formatters and translators are very tied to the particular one or few instances they apply to. Making them available as a drop-down list in an admin interface makes it seem they can be arbitrarily changed would certainly lead to issues.