The Budō Lineage Tree is a community-driven database of teacher/student interactions, presented as an interactive lineage tree.
There are three main components in the project:
While this is done with Judo as a starting point, it is not limited to it: much like Judo had a pioneering role in Budō that influenced many other martial arts (and was in many ways a bridge between Koryū and Gendai), so does this project start from Kodokan Judo with the aim of uncovering the rich history of interactions between disciplines.
Creating, adding to, and correcting the YAML files is what drives everything else.
Each YAML file describes an individual that has at least one "teacher" (see FAQ for terms): the relationships are always from student -> teacher, and not the other way around. This means that the YAML files identify teachers, but not explicitly students, those being visualised in the tree by their own relationship as students.
The YAML format covers these main areas:
The entire process has been built (purposely) around git, and specifically GitHub (more on that in the FAQ below). To participate, you'll need:
The GitHub account is the easy part. If you already know git and are used to git-based workflows, there's no secret here: feel free to clone it or fork it, create a branch and edit files with any editor, and submitt the PR.
If the previous paragraph was cryptic, using GitHub's interace will mostly guide you through until we improve instructions:
The above can still be challenging for someone completely new to GitHub, but we will improve the instructions in due time.
We will start by focusing on "leafs" that can link to some of the existing "nodes"; this means that we will focus on finding branches that can connect to any of the existing individuals, instead of adding unconnected persons.
At a later stage we will relax this requirement, but for now any addition should be connected at least in one path: it's perfectly fine to add more "ancestors" that end up being unconnected, if they are connected to someone that is linked with existing individuals.
If the above is impossibly difficult, there's an alternative: open an Issue with the information that you would like to add/change. This can take more time, but eventually it should make its way to the database, after someone picks it up and makes the corresponding changes.
To open an issue, find the "Issues" tab and create a new one.
Every entry should have one or more sources that allows anyone to determine from where the information is derived. This might seem overkill when thinking on well-known aspects of well-known people, but it's very important to be consistent about it: the reality is that our knowledge is often based on a mix of myths, half-truths, unconfirmed events, and partial understanding of real events. By explicitly adding sources, we can at least clearly identify the origin of the information.
Not all sources have the same status: a random comment in an internet forum, by an anonymous user that doesn't state the origin of the information, is clearly less authoritative than a published paper that underwent peer review. This is not to say that one is right and the other is wrong: merely that, faced with conflicting sources, those that identify their one sources carry more weight.
How should we reflect conflicting information about lineage-related aspects? Should we show only paths that can be shown to have sufficient backing? How to determine what type of sources are acceptable?
The approach we took was to:
This is done through the use of a quality
field in the teachers
section.
This is inspired by the GEDCOM standard used in
genealogy,
and used precisely for the same reason.
There is always a degree of relativity in determining the "quality" of sources, but these guidelines should be good enough to start with.
Some examples:
The more people participate in the project, the better, but also the more likely it is that different perspectives on what should be added to the database exist.
The general approach will be to have (public) discussions on the relevant topic, in the form of an Issue. The overall quality of the database is an important goal, so if needed, the project lead will determine the final outcome in the case of no consensus.
Having an extensive database is good, but it is not as important as having a quality database that clearly indicates the sources and how they are used. Making another comparison with genealogy, the Internet is filled with "family trees" that go back centuries, build by people that, in their desire to have a long ancestry line, import other trees that are built with similar carelessness. The information appears impressive, but a superficial look into it shows that most of it is false, untrackable, unproven.
Sources and photos use URLs, which are external to the project. Especially in terms of the sources, this is unavoidable since the goal is to clearly point to where things come from, and most of the times this means an URL (although not always, since we can also add URIs that are not URLs, like for example ISBN numbers).
This leads to the possibility of "link rot", which is when the URLs we use stop working. This should be fixed since it's important to keep information available (and in the case of photos, it's even more visible).
To help with this, we regularly collect and scan all the links in the YAML files (using Linkinator, and any broken links are added to the report so that they can be fixed:
aux/Links.md
is a Markdown file with all the
collected links from the YAML files.aux/broken_links.csv
is a CSV file with
all the broken links found.The process is straightforward and it should be easy to act upon the findings; to fix broken links the following is recommended:
Using git, and GitHub, is one of the core concepts in this project because it addresses several requirements that would otherwise require specific solutions:
While there is a learning curve to using it, the advantages overwhelmingly
compensate them. The last point, specifically, means that data here will be
always availabile, and can at any time be forked ("copied") by anyone. Those
that remember the amount of information lost with the demise of judoforum.com
will understand why this is not irrelevant.
I wanted something that would be easy to edit be humans, without special tools, while providing enough structure to be easily parsed. YAML is one of the most obvious choices for this.
Most technological choices were made because they seemed to be the best for the domain, and also because they appeared to be simple enough to get started:
The core of the project is the database, and that is ultimately resilient to changes given that it only depends on the YAML format, but the more visible part of it is the web application; the technology used can be changed if need be, with Cytoscape.js and node.js being the ones that constitute the core that will almost surely remain.
The YAML format is a work-in-progress and not written in stone, but some of the
terms were a compromise between those that would be more correct to a specific
situation, and those that had a wider scope. An example of this is style
,
which will be applied to anything from Kodokan Judo to Catch-as-Catch-Can. As
we progress, improvements in terminology can be made.
A YAML schema will be available Real Soon Now, but the only mandatory fields are:
Not mandatory (but almost) are nationality/place of birth/native name, in the sense that they are usually easy to add and can be used for visualisation purposes. The more information, the better.
This is a work-in-progress that will change in the short-term, but for now the IDs are just incrementally generated "by hand". When adding something new, use existing IDs (of existing persons/styles) when possible, and leave them blank for the new individual, the ID will be added after.
This is done by adding a new entry to teachers
, with the same id
,
but different style_id
, for example:
teachers:
- id: JDP-26 # Tomiki Kenji
style_id: JDP-S-4 # Taught Aikido
place:
period:
start:
end:
(...)
- id: JDP-26 # Tomiki Kenji
style_id: JDP-S-1 # Taught Judo
place:
period:
start:
end:
(...)
This will create two separate lines from teacher to student, and will keep all information about that relation specific to it (different locations, time periods, sources, etc.)
(see issue #35 for a discussion.)
Yes, all individuals can be selected through the use of URL query
parameters, e.g. https;//budotree.judoc.org/?id=JDP-1
, by using the
id
. This can easily be obtained using the information box footer
button with a link symbol, and used to share links to specific persons.
The following options can be set as query parameters:
id
: selects the individual with the id (e.g., id=JDP-1
).style
: selects the style with the id (e.g., style="JDP-S-1
).focus
: if true
, turns on the individual focus mode (e.g., focus=true
).infobox
: sets the information box visibility, allowed values are visible
, hidden
(default), and toggle
(e.g., infobox=visible
).layout
: selects the layout, allowed values are the same as the layout dropdown (e.g., layout=concentric
).lang
: sets the interface language, allowed values are the same as the language dropdown (e.g., lang=ja
).They can be combined, for example: https://budotree.judoc.org?id=JDP-12&infobox=visible&focus=true&layout=mrtree&lang=ja.
Different martial arts have different approaches: some have a teacher-student relation that includes rank, while others have central organisations that bestow rank. As such, learning from someone is not always the same as receiving rank from someone, even if it's the teaching that contributes to the body of knowledge. Separating rank from teachers allows to keep track of the teacher-student relationship without making assumptions about rank.
Yes, this is likely a better idea. Currently, sources are at the "root" level to keep things simple: it's already several orders of magnitude better to have sources listed, and enforce that practice.
Following the genealogy research parallel, it would be better to have sources that can be attached to a specific assertion:
A given source may be the basis for many different assertions. Thus, much of the information is the same for many different citations of that source, such as the publisher information; and yet, some of the information varies from one citation to the next, such as the page number for a specific item. Consequently, the SOURCE_STRUCTURE includes a sophisticated mechanism for sharing general source description information that is common across multiple citations, while at the same time allowing more specific information to be more directly associated with individual citations. All tags within the SOURCE_STRUCTURE participate in this approach.
We need to balance how to do this with keeping it simple enough - as simple as it can be, but not simpler. One way to do it would be:
source
fields in the specific section
(e.g. teachers->[id=<ID of Teacher 1>->source
).uri
as the source ID, which would then point to a more
complete entry for the source, in a separate YAML, with name, etc.source->page
or source->citation
field.We will implement the first shortly enough; the second is an open discussion, and the third will depend on how much this becomes a real issue.