Super awesome interactive taxonomic tree of COMPADRE

Spandex-at-Exeter / demography_database

Demography Database - Flask web application incorporating COMPADRE and COMADRE

MIT License

3 stars 1 forks source link

How it might work:

Get taxonomic information from the database in a kingdom/phylum/class/order/family/genus table format. This could be done via templating (slow) or ajax (more complicated?).
Turn this data into Newick format, via this: (?) found at: https://www.biostars.org/p/114387/

import csv
from collections import defaultdict
from pprint import pprint

def tree(): return defaultdict(tree)

def tree_add(t, path):
  for node in path:
    t = t[node]

def pprint_tree(tree_instance):
    def dicts(t): return {k: dicts(t[k]) for k in t}
    pprint(dicts(tree_instance))

def csv_to_tree(input):
    t = tree()
    for row in csv.reader(input, quotechar='\''):
        tree_add(t, row)
    return t

def tree_to_newick(root):
    items = []
    for k in root.iterkeys():
        s = ''
        if len(root[k].keys()) > 0:
            sub_tree = tree_to_newick(root[k])
            if sub_tree != '':
                s += '(' + sub_tree + ')'
        s += k
        items.append(s)
    return ','.join(items)

def csv_to_weightless_newick(input):
    t = csv_to_tree(input)
    #pprint_tree(t)
    return tree_to_newick(t)

Parse from newick format in json using newick.js or find a way to avoid step 2 completely
Use d3.js to draw taxonomic tree (http://bl.ocks.org/d3noob/8375092) and somehow work out how to add tags so they are hyperlinked

I think I understand - one drawing for the whole database?

In terms of image: Love the idea of using d3. We could make a script to run daily that saves the taxonomic data to a .json file.. the .json file will always have the same name, we can just duplicate old ones and append a timestamp in order to keep a reliable archive. We can call on this .json file to draw the tree. My only concern is that it will be a big ask to draw it to the DOM each request, so maybe looking at an alternative way (pre-rendering with the daily script) is needed.

I think it'll be a case of trying it out using the above method and improving it if needs be.

In terms of making a CSV of all the taxonomic information, that's a script we can run too.

It might make sense to have a python file that has several maintenance tasks, such as this, meta tables, SQL dump, outputting a schema... etc. So it isn't all so disjointed. It could be ran daily as a system task, or when the database has a new record.

Spandex-at-Exeter / demography_database

Super awesome interactive taxonomic tree of COMPADRE #45