Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
http://signbank.cls.ru.nl
BSD 3-Clause "New" or "Revised" License
19 stars 12 forks source link

Code hygiene: make a Django model about the database again #759

Open Woseseltops opened 3 years ago

Woseseltops commented 3 years ago

I think the Django philosophy (and the 'model view template' philosohpy in general) is that:

While it is totally okay to have a few lines of code inside a model, for example to construct some more information about that 1 object on the fly, I notice that we are often doing lots of logic inside model itself, in particular for the Gloss model. I think that this should be considered 'dirty' and that we should refactor for this to be in a view or helper function.

A pattern I feel should be avoided in particular is logic where inside a model (like a gloss) something is done or comparisons are made to other objects (other glosses).

Woseseltops commented 3 years ago

Also, to be sure, not blaming anyone here, I think we all forgot to keep this in mind :)

susanodd commented 3 years ago

@Woseseltops I understand what you're saying. I think one glitch is that there are a lot of relations in the models. As you introduced this via the minimal pairs, that is kind of an example of something that is kind of a relation, but it's also computed. This is partly because there are plenty of glosses for which the phonology has not been filled in completely. And users perhaps also may be unaware that a minimal pair exists. (?) I don't think minimal pairs can be implemented as relations. (?) But they are computed, so that means they should not be in the models. (?) Moreover, some of the "dirty" code is making use of the field choices and the field_choice_category. Although that has kind of permeated all of the code. And they also need to be translated in their human readable form. At one point we wanted the field_choice_category to remain hidden. But that has also permeated all of the code.

susanodd commented 3 years ago

One part that ended up in the model is stuff with path names locating images and videos for a gloss. And added to that are paths related to datasets. Those aren't part of the database per se, (and|but) they are stored in the file system.

Some of the stuff in the database aka models, are keywords and translations. That's not really a database table or model in the same way a Gloss is.

Woseseltops commented 3 years ago

Good examples @susanodd . Let's organize this situation!

Is about one object Is about database structure Does not include (complex) logic
Path naming Yes No Yes
Minimal pairs No No No

Based on this, I would say the minimal pairs functionality would be a good first candidate to move elsewhere, perhaps a separate .py file? By the way, when I say moving elsewhere, I simply mean to translate something like this:

class MyDjangoModel:
    def calculate_this_and_that(self):
        return self.this_var * self.that_var

... to something like this ...

function calculate_this_and_that_for_model(my_django_model):
    return my_django_model.this_var * my_django_model.that_var
susanodd commented 3 years ago

Okay, that helps. I thought of another example that I ended up moving code to tools.py. The code that creates the scroll search results at the tops of the detail views. https://github.com/Signbank/Global-signbank/blob/9d0bddac2ed6ce631fc75f97de7821fa98ea9d3d/signbank/tools.py#L2259

The code is parametric (in a sense) on the type of things in the scroll bar. Creating that function removed a lot of "versions of the same code on different object types" that was in the adminviews.

I guess that's like meta computations that are done for different models.

What helped in the past to reduce code was when we started using ajax to create the table rows in the list view, so the row template was moved to a different file. https://github.com/Signbank/Global-signbank/blob/9d0bddac2ed6ce631fc75f97de7821fa98ea9d3d/signbank/dictionary/templates/dictionary/admin_gloss_list.html#L150-L190

At the time, I worked on that because the list view template had become unmanageable. That was about the time you insisted on all the settings to get rid of hard coding.

A huge amounts of code in some of the templates now is the actual query formulation. Although that ought to be something that Django is supposed to be helping with. In the admin_gloss_list there's a lot of inline code for creating glosses.

susanodd commented 3 years ago

I guess what sometimes confuses me is the old-fashioned idea of "abstraction" that you ought not to reveal the insides of the model outside of the model, and only use the methods of the model outside of the model. That's how some code has ended up "inside" the Gloss class as methods. So it would be an "abstract method". Although such intention is defeated when other parts of the code looks inside anyway and breaks the abstraction. There are also instances of "fiddling" with trying to figure out exactly what the "result" of the abstraction should be, as with the numerous methods to extract different kinds of paths. In trying to locate the "path" methods, I noticed this non-parametric obtuse code (mea culpa): https://github.com/Signbank/Global-signbank/blob/9d0bddac2ed6ce631fc75f97de7821fa98ea9d3d/signbank/dictionary/models.py#L966-L1013

That looks like either too much "abstraction" or too much "redundant" code.

susanodd commented 3 years ago

Those count methods shown above are used in the Relations View to determine whether different sections of the display will be empty,

susanodd commented 3 years ago

Maybe we need a list showing "problems" where the code is examples of inappropriate coding style.

TO DO stuff.

susanodd commented 3 years ago

@Woseseltops I've encountered code that needs to be implemented to support the Frequency data of datasets. (#756)

Using the GlossFrequency objects model, and the idea that Regions (saved as meta data for speakers) need to be dynamic rather than hard-coded as they are now for CNGT in the Gloss model, there is actually quite a lot of computation needed in order to "calculate" the "number of occurrences per region" for a given gloss. (Formerly hard-coded as fields in Gloss). Basically, the GlossFrequency objects for the particular gloss need to be queried to determine how many speakers (Signers) per region use this gloss. The region is a text field ('location') of Speaker. This seems like an example of something that should not be done in models.py. Although the objects are stored in the database. The regions are in a meta csv file.

susanodd commented 2 years ago

@Woseseltops this issue is very relevant for the GlossFrequency / Corpus data. The granularity of the data is very small. Moreover, it ends up being accessed via the Dataset, the Speaker, the Document, and the Gloss. It's not entirely clear what kinds of functions are most efficient or most useful. I've ended up making a variety of helper functions to assist in testing. And then implemented tests on the alternative helper functions, that they indeed give the same results.

susanodd commented 2 years ago

Here are some examples on my local server (just the signature and functionality):

def gloss_to_speakers(gloss_id): # maps a gloss id to speakers that occur in frequency data for the gloss

def glosses_X_speakers(dataset_acronym): # maps a corpus name to a dictionary that maps gloss ids to speakers that occur in frequency data for that gloss

The thing is, both are useful elsewhere, at the gloss level, or at the corpus level. But the two functions are very similar. The idea is to avoid an extreme amount of queries to the database. So for the function that returns a dictionary, it only does one query instead of one for each gloss as would be the case if the first is used inside an iteration.

susanodd commented 2 years ago

This issue has been adhered to in recent updates to the model in #793. The Dataset method uploaded_eafs file system specific functionality has been moved to frequency.py

The trimmed down method remains in the model as this is more convenient to call in a template, having the dataset object on hand.