denshoproject / ddr-local

Web UI used for interacting with DDR collections and entities on a local machine.
Other
3 stars 0 forks source link

Collections list page takes too long and times out #294

Open gjost opened 3 years ago

gjost commented 3 years ago

The collections list page is timing out and showing an Nginx 502 gateway error page. This issue affects multiple VMs. Affected VMs are able to run ssh git@mits.densho.org info as ddr without issue. Disk space is not a problem. VMs can view and work in individual collections.

To test I installed ddr-local on maunakea and ran the following code, adapted from ddr-local:webui.views.collections.collections:

from datetime import datetime
from django.conf import settings
from webui import gitolite
from webui.models import Collection
from webui.identifier import Identifier

def get_repos_orgs():
    rostart = datetime.now()
    repos_orgs = gitolite.get_repos_orgs()
    print(datetime.now() - rostart)
    return repos_orgs

def list_collections(repos_orgs):
    allstart = datetime.now()
    for object_id in repos_orgs:
        identifier = Identifier(object_id)
        repo,org = list(identifier.parts.values())
        colstart = datetime.now()
        collection_paths = Collection.collection_paths(settings.MEDIA_BASE, repo, org)
        for collection_path in collection_paths:
            identifier = Identifier(path=collection_path)
            collection = Collection.from_identifier(identifier)
            gitstatus = collection.gitstatus()
            print(f"    {gitstatus.get('status')}")

        print(datetime.now() - colstart, f'  {repo}-{org}')

    print(datetime.now() - allstart)

repos_orgs = get_repos_orgs()
list_collections(repos_orgs)

This took betw 46s and 1:28 to complete. The ddrlocal upstream fail_timeout in /etc/nginx/sites-enabled/ddrlocal.conf is set to 600s so this is not quite long enough to cause a timeout.

gjost commented 3 years ago

Looks like it's time to break up the collections list page. I'd recommend have an initial page that lists organizations, followed by paginated collections lists for each organization.

gjost commented 3 years ago

@GeoffFroh: Turn off the sync status functionality on the collections list page for now so the archivists can get their work done.