Open Woseseltops opened 3 years ago
Also, to be sure, not blaming anyone here, I think we all forgot to keep this in mind :)
@Woseseltops I understand what you're saying. I think one glitch is that there are a lot of relations in the models. As you introduced this via the minimal pairs, that is kind of an example of something that is kind of a relation, but it's also computed. This is partly because there are plenty of glosses for which the phonology has not been filled in completely. And users perhaps also may be unaware that a minimal pair exists. (?) I don't think minimal pairs can be implemented as relations. (?) But they are computed, so that means they should not be in the models. (?) Moreover, some of the "dirty" code is making use of the field choices and the field_choice_category. Although that has kind of permeated all of the code. And they also need to be translated in their human readable form. At one point we wanted the field_choice_category to remain hidden. But that has also permeated all of the code.
One part that ended up in the model is stuff with path names locating images and videos for a gloss. And added to that are paths related to datasets. Those aren't part of the database per se, (and|but) they are stored in the file system.
Some of the stuff in the database aka models, are keywords and translations. That's not really a database table or model in the same way a Gloss is.
Good examples @susanodd . Let's organize this situation!
Is about one object | Is about database structure | Does not include (complex) logic | |
---|---|---|---|
Path naming | Yes | No | Yes |
Minimal pairs | No | No | No |
Based on this, I would say the minimal pairs functionality would be a good first candidate to move elsewhere, perhaps a separate .py file? By the way, when I say moving elsewhere, I simply mean to translate something like this:
class MyDjangoModel:
def calculate_this_and_that(self):
return self.this_var * self.that_var
... to something like this ...
function calculate_this_and_that_for_model(my_django_model):
return my_django_model.this_var * my_django_model.that_var
Okay, that helps. I thought of another example that I ended up moving code to tools.py. The code that creates the scroll search results at the tops of the detail views. https://github.com/Signbank/Global-signbank/blob/9d0bddac2ed6ce631fc75f97de7821fa98ea9d3d/signbank/tools.py#L2259
The code is parametric (in a sense) on the type of things in the scroll bar. Creating that function removed a lot of "versions of the same code on different object types" that was in the adminviews.
I guess that's like meta computations that are done for different models.
What helped in the past to reduce code was when we started using ajax to create the table rows in the list view, so the row template was moved to a different file. https://github.com/Signbank/Global-signbank/blob/9d0bddac2ed6ce631fc75f97de7821fa98ea9d3d/signbank/dictionary/templates/dictionary/admin_gloss_list.html#L150-L190
At the time, I worked on that because the list view template had become unmanageable. That was about the time you insisted on all the settings to get rid of hard coding.
A huge amounts of code in some of the templates now is the actual query formulation. Although that ought to be something that Django is supposed to be helping with. In the admin_gloss_list there's a lot of inline code for creating glosses.
I guess what sometimes confuses me is the old-fashioned idea of "abstraction" that you ought not to reveal the insides of the model outside of the model, and only use the methods of the model outside of the model. That's how some code has ended up "inside" the Gloss class as methods. So it would be an "abstract method". Although such intention is defeated when other parts of the code looks inside anyway and breaks the abstraction. There are also instances of "fiddling" with trying to figure out exactly what the "result" of the abstraction should be, as with the numerous methods to extract different kinds of paths. In trying to locate the "path" methods, I noticed this non-parametric obtuse code (mea culpa): https://github.com/Signbank/Global-signbank/blob/9d0bddac2ed6ce631fc75f97de7821fa98ea9d3d/signbank/dictionary/models.py#L966-L1013
That looks like either too much "abstraction" or too much "redundant" code.
Those count methods shown above are used in the Relations View to determine whether different sections of the display will be empty,
Maybe we need a list showing "problems" where the code is examples of inappropriate coding style.
TO DO stuff.
@Woseseltops I've encountered code that needs to be implemented to support the Frequency data of datasets. (#756)
Using the GlossFrequency objects model, and the idea that Regions (saved as meta data for speakers) need to be dynamic rather than hard-coded as they are now for CNGT in the Gloss model, there is actually quite a lot of computation needed in order to "calculate" the "number of occurrences per region" for a given gloss. (Formerly hard-coded as fields in Gloss). Basically, the GlossFrequency objects for the particular gloss need to be queried to determine how many speakers (Signers) per region use this gloss. The region is a text field ('location') of Speaker. This seems like an example of something that should not be done in models.py. Although the objects are stored in the database. The regions are in a meta csv file.
@Woseseltops this issue is very relevant for the GlossFrequency / Corpus data. The granularity of the data is very small. Moreover, it ends up being accessed via the Dataset, the Speaker, the Document, and the Gloss. It's not entirely clear what kinds of functions are most efficient or most useful. I've ended up making a variety of helper functions to assist in testing. And then implemented tests on the alternative helper functions, that they indeed give the same results.
Here are some examples on my local server (just the signature and functionality):
def gloss_to_speakers(gloss_id):
# maps a gloss id to speakers that occur in frequency data for the gloss
def glosses_X_speakers(dataset_acronym):
# maps a corpus name to a dictionary that maps gloss ids to speakers that occur in frequency data for that gloss
The thing is, both are useful elsewhere, at the gloss level, or at the corpus level. But the two functions are very similar. The idea is to avoid an extreme amount of queries to the database. So for the function that returns a dictionary, it only does one query instead of one for each gloss as would be the case if the first is used inside an iteration.
This issue has been adhered to in recent updates to the model in #793.
The Dataset
method uploaded_eafs
file system specific functionality has been moved to frequency.py
The trimmed down method remains in the model as this is more convenient to call in a template, having the dataset object on hand.
I think the Django philosophy (and the 'model view template' philosohpy in general) is that:
While it is totally okay to have a few lines of code inside a model, for example to construct some more information about that 1 object on the fly, I notice that we are often doing lots of logic inside model itself, in particular for the Gloss model. I think that this should be considered 'dirty' and that we should refactor for this to be in a view or helper function.
A pattern I feel should be avoided in particular is logic where inside a model (like a gloss) something is done or comparisons are made to other objects (other glosses).