DCT: incorporate clustering plug-in

ErwinKomen commented 10 months ago

Integrate the clustering plug-in that we talked about.

ErwinKomen commented 10 months ago

Explanation

The passim-plugin consists of two basic parts

The dashboard, which works on 'prepared' data - first part
1. Outward appearance (front end)
2. Functions interacting with the plugin behind the 'page'
3. The plugin itself (already created by GS)
A method to prepare data for the dashboard - second part

ErwinKomen commented 9 months ago

Implementation: part 1, user dashboard

Added models (database tables) to facilitate a dashboard:
1. BoardDataset: points to a location on the server that holds pre-calculated dataset data
2. SermonsDistance: list of methods to measure distance between sermons - pre-set via admin interface
3. SeriesDistance: list of methods to measure distance between series - pre-set via admin interface
4. Dimension: choice between 2d and 3d (for the moment; pre-set via admin)
5. ClMethod; list of clustering methods that can be used - preset via the admin interface
6. Highlight: list of fields and other things that can be used as a 'highlight'. Two main components:
  1. A number of fields from the SermonDescr: library, idno, lcity, lcountry, date, total, sermons, content, century, age, is_emblamatic
  2. Full manuscript names, using their standard identification (city, library, shelfmark)
Added Tools > Plugin, a link to the dashboard, which is in plugin/view.py sermboard
1. This makes available to the user all the data that can be chosen from via BoardForm + plugin/sermonboard.html
Create space for the datasets on the server
1. When we go to a container, this should be 'externally linked'
2. Whether via link or not, the location will be MEDIA_DIR/plugin/preprocessed_data/...
The datasets need to be 'loaded' in order to be usable. When to load them?
1. Use a global store in calculate.py
2. Load this store, as soon as a new object GenGraph is created
  1. If it is slow: make it a separate thread working background
The dashboard possibilities should be dependant on the tab that is chosen
1. Umap: initial highlight should be lcountry

ErwinKomen commented 9 months ago

Remaining issues

Implement user interface response on changing tab pages
1. Make use of <div groups umap_params and clustering_params - works
Implementing Clustering:
1. Cannot find name linkage in calculations
  1. This is a function from scipy/cluster/hierarchy.py
  2. Should have entered the scene via from scipy.cluster.hierarchy import *
2. Okay, is working now!
Implementing Umap: is working
Implementing Series Heatmap: 1.
Implementing Sermons Heatmap:
1. No figure being produced...
2. Error "None of Code in ..." - the correct combination of Sermons Distance and Series Distance is required

Filter	`Clustering`	`Umap`	`Series Heatmap`	`Sermons Heatmap`
Minimal collection length	5	5	5	5
Sermons	+	+	+	+
Anchor manuscript	+	+	+	+
Number of closest manuscripts	10	10	10	10
Target dimension	-	`2D`	-	-
Highlight	-	`century`	-	-
Number of neighbours	-	10	-	-
Minimal distance	-	0.1	-	-
Clustering method	`ward`	-	-	-

ErwinKomen commented 9 months ago

Follow-up: see issue #733

ErwinKomen / RU-passim

DCT: incorporate clustering plug-in #711

Explanation

Implementation: part 1, user dashboard

Remaining issues