WorldHistoricalGazetteer / whg3

Version 3 beta
BSD 3-Clause "New" or "Revised" License
4 stars 4 forks source link

Beta Launch Pre-Flight Check #214

Closed docuracy closed 4 months ago

docuracy commented 4 months ago
docuracy commented 4 months ago

@kgeographer

Some portal pages (well, at least one - I can't check others because the server is rebooting, I think) are failing to load due to malformed timespans, which come directly from the timespans JSONField in the Place model, for example (https://dev.whgazetteer.org/places/13970069/portal/):

... [200, 1933], [500, 1800], '[', '2', '0' ...

It seems that integer values might have been stored as strings within encoded quote marks (or possibly the square brackets and commas are enclosed in encoded quote marks, though this seems less likely).

kgeographer commented 4 months ago

I see what the issue is with timespans, and have partially fixed it - some values were malformed in the migration, enclosing the entire value in double quotes; got rid of those. Also many nulls got transformed to '"\N"' and similar - fixed that. There are 183 records that have a mix of integers and quoted strings in the arrays - so the arrays are valid...that's how they are in v2. Ok to remain? e.g. [[-600, 600], ["135", "250"]] and [[-9999, -1000], ["135", "250"], [1525, 1525]]

I'll have to speak to the developer about normalization :^P

docuracy commented 4 months ago

KG: slow load on server, not locally so it's the data. removed collections from featured status, no help. the browser network console shows 'pending' [figs below]. best to wait & see whether adding collection data solves it.

image image

kgeographer commented 4 months ago

re: basemaps, only the home and gallery pages must have 'bare topography' - areas and tasks can also, no need for context in those - well maybe countries would be nice to see

on any that offer a zoom to feature, there has to be a buffer - most zoom too far in by far

docuracy commented 4 months ago

re: basemaps, only the home and gallery pages must have 'bare topography' - areas and tasks can also, no need for context in those - well maybe countries would be nice to see

on any that offer a zoom to feature, there has to be a buffer - most zoom too far in by far

There are now 3 different map styles in use:

If it becomes apparent that anything is annoying on the areas and tasks pages, it's easy enough to define an additional style. And next time I'll be sure to avoid spurious trailing commas in JSON lists (which are legal in Python data structures): the tileserver doesn't do dodgy JSON.

There's now a global defaultZoom variable set to the value 8 in base.js, which is applied as a maximum zoom parameter wherever the whg_maplibre.fitViewPort method is called - except on the home page and in carousels. It seems to me to be about right in most contexts, but can now easily be adjusted.

docuracy commented 4 months ago

The codebase was using a UMD version of spin.js, which according to the documentation "is only available as a temporary workaround".

It is now (with the next PR) using the recommended ECMAScript module, bundled with Webpack and extended in base.js to allow initialisation and stopping with simple JQuery selectors. Scaling is determined automatically based on the size of the target container.

Examples:

$('#drftable_list, #detail').spin(); // Start spinners on single or multiple elements - can also use `.class` selectors

$('#detail, .maplibregl-control-container').stopSpin(); // Stop existing spinners on single or multiple elements
docuracy commented 4 months ago

https://dev.whgazetteer.org/ Testing

Mapdata & Filecache

Map layer paint issues

UI Issues

Image

docuracy commented 4 months ago

Optimisation of datasets.utils.fetch_mapdata_ds still results in very long processing time, and so I have now implemented Django's FileBasedCache for this function and for collection.views.fetch_mapdata_coll. Server response times are reduced drastically after the first visit to a page:

Chinese Counties: 5.53s -> 29.53ms Chinese Towns: 55.57s -> 515ms Pokémon: 352ms -> 8.87ms

I've set the cache system (in settings.py) with no expiry, meaning that the functions will need to be called with ?refresh_cache=true whenever the dataset or collection is updated, or whenever it has a tileset generated or removed. Calling them on first publication would of course remove the potentially-long wait on a first visit to the associated page.

Apart from evident performance gains, this will reduce the need to upgrade server processing and memory capacity. The disk capacity cost of maintaining a set of cache files (just one for each Dataset/Collection) is probably slight, but my security clearance level does not permit me to access the (new) /cache directory to assess the size of the generated files.

Modifications are currently running on DEV in a hotfix branch. I'll copy them to my local branch and make a PR in due course if you're satisfied with them, @kgeographer.

docuracy commented 4 months ago

I've identified the problem that was causing Portal pages to fail to load properly. If the places had only one timespan between them the histogram generator class (in historygram.js) would crash (silently, for some reason). Hotfix applied (no histogram for such cases).

The class would also fail (and the page stall) if the total range exceeded 500 years: that's also hotfixed.

Incidentally, London in the DK dataset includes coordinates on both sides of the Atlantic, wrongly implying that they are the same place. The gloss includes metadata extracted from the map layers at the centroid of these points, mid-Atlantic.

docuracy commented 4 months ago

After much trial-and-error hot-fixing, I've identified the source of the sometimes > 60-second delay in loading Place Detail pages, on the Dataset model:

  @property
  def minmax(self):
    # TODO: temporal is sparse, sometimes [None, None]
    timespans = [p.minmax for p in self.places.all() \
                 if p.minmax and len(p.minmax) == 2 and p.minmax[0]]
    # print('ds; timespans as model property', self.label, timespans)
    earliest = min([t[0] for t in timespans]) if len(timespans) > 0 else None
    latest = max([t[1] for t in timespans]) if len(timespans) > 0 else None
    minmax = [earliest, latest] if earliest and latest else None
    return minmax

I've replaced it with this, which deals with the #TODO and is much more efficient:

  @property
  def minmax(self):
    minmax_values = Place.objects.filter(dataset=self).aggregate(
        # This ignores `None` values, effectively handling temporal sparsity [None, None]
        earliest=Min('minmax__0'),
        latest=Max('minmax__1')
    )
    earliest = minmax_values['earliest']
    latest = minmax_values['latest']
    return [earliest, latest] if earliest and latest else None

Server response time for https://dev.whgazetteer.org/places/5907944/detail is reduced from 62 seconds to 1.05 seconds.

docuracy commented 4 months ago

Search from Home Page was broken, because after a successful search and redirection to the Search Page, search.js was not loading results from the last_search in LocalStorage. This seems to be due to tightening of default browser security policies, disallowing access to document.referrer. There were also problems with the way fclasses were being (mis-)handled. Hotfixes copied to my current branch.

docuracy commented 4 months ago

FYI, @kgeographer, some of the trace data (well, one trace in one collection that I know of, number 76) has a zero day number in a yearstring (which was breaking fetch_mapdata_coll). I've modified collection.views.year_from_string to handle it, but you might want to parse incoming data to eliminate things like that.

docuracy commented 4 months ago

Bug found when examining test dataset locally: even if this is just a test data problem, code should be made more robust to handle this eventuality.

http://localhost:8001/api/place/25195/

Traceback (most recent call last):
  File "/py/lib/python3.10/site-packages/django/core/handlers/exception.py", line 56, in inner
    response = get_response(request)
  File "/py/lib/python3.10/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/py/lib/python3.10/site-packages/django/views/generic/base.py", line 103, in view
    return self.dispatch(request, *args, **kwargs)
  File "/py/lib/python3.10/site-packages/django/views/generic/base.py", line 142, in dispatch
    return handler(request, *args, **kwargs)
  File "/app/api/views.py", line 1197, in get
    "types": add_urls(sort_unique([type for place in serialized_places for type in place["types"]], 'label')),
  File "/app/api/views.py", line 1130, in sort_unique
    item_unique = item[key] if key else json.dumps(item)

Exception Type: KeyError at /api/place/25195/
Exception Value: 'label'