bruvellu / cifonauta

Marine biology image database by CEBIMar/USP
http://cifonauta.cebimar.usp.br
GNU General Public License v3.0
21 stars 5 forks source link

Optimize queries for tree visualization #300

Open bruvellu opened 3 months ago

bruvellu commented 3 months ago

Now that users can add new taxa to the database, we need a way to only show in the public taxonomic tree taxa that are “public” (i.e., taxa that have published media files).

Formerly, the tree was generated using all the taxa in the database. That's because a taxon was only added when a media tagged with it was published. However, now a taxon can be added before its associated media files are published. Old queries:

# meta/templatetags/extra_tags.py
taxa = Taxon.objects.select_related('parent')
# meta/views.py
genera = Taxon.objects.filter(rank_en='Genus').order_by('name')

As of d61536f9b6a04b0e3c9fcc6b9fc6a57f8e3b73bc, new queries are in place. They work, but take really long to run. Checking the status of every media of every taxon and ancestors is inefficient.

# meta/templatetags/extra_tags.py
taxa = Taxon.objects.filter(media__status='published').get_ancestors(include_self=True)
# meta/views.py
genera = Taxon.objects.filter(media__status='published').get_ancestors(include_self=True).filter(query)

The best solution I see so far is adding a is_public field to the Taxon model, which is False by default and only becomes True (for the taxon and its ancestors) when the associated media is published. The queries would then be:

# meta/templatetags/extra_tags.py
taxa = Taxon.objects.filter(is_public=True).select_related('parent')

Moreover, we should consider removing the species list from the taxa_page, it's not very useful. And remember that MPTT is no longer maintained (#167).

bruvellu commented 3 months ago

Found a quick temporary solution. Adding a distinct() method to the queryset reduces the time from ~14s to ~100ms:

taxa = Taxon.objects.filter(media__status='published').distinct().get_ancestors(include_self=True)

Implemented in 3b12bfc18773004ec0401c2c65b32e7e716a667f.