Judo-Documentation-Project / budotree

Judo Lineage Tree
GNU General Public License v3.0
5 stars 4 forks source link

Node.js CI

Budō Lineage Tree

Budō Documentation Project

The Budō Lineage Tree is a community-driven database of teacher/student interactions, presented as an interactive lineage tree.

There are three main components in the project:

While this is done with Judo as a starting point, it is not limited to it: much like Judo had a pioneering role in Budō that influenced many other martial arts (and was in many ways a bridge between Koryū and Gendai), so does this project start from Kodokan Judo with the aim of uncovering the rich history of interactions between disciplines.

How it works

Creating, adding to, and correcting the YAML files is what drives everything else.

Each YAML file describes an individual that has at least one "teacher" (see FAQ for terms): the relationships are always from student -> teacher, and not the other way around. This means that the YAML files identify teachers, but not explicitly students, those being visualised in the tree by their own relationship as students.

The YAML format covers these main areas:

How to contribute

The entire process has been built (purposely) around git, and specifically GitHub (more on that in the FAQ below). To participate, you'll need:

The GitHub account is the easy part. If you already know git and are used to git-based workflows, there's no secret here: feel free to clone it or fork it, create a branch and edit files with any editor, and submitt the PR.

If the previous paragraph was cryptic, using GitHub's interace will mostly guide you through until we improve instructions:

  1. Find an existing file, or identify a missing one about someone that you want to add. The explorer adds a Source YAML link to every entry that leads directly to the right file.
  2. Create a new file based on an existing entry, or on a template, or click "Edit" in an existing file.
  3. This will create a fork of the repository in your account; edit the file and follow the instructions to commit to your copy, and submit a Push Request.
  4. In the Push Request discussion, address any comments/requests.

The above can still be challenging for someone completely new to GitHub, but we will improve the instructions in due time.

We will start by focusing on "leafs" that can link to some of the existing "nodes"; this means that we will focus on finding branches that can connect to any of the existing individuals, instead of adding unconnected persons.

At a later stage we will relax this requirement, but for now any addition should be connected at least in one path: it's perfectly fine to add more "ancestors" that end up being unconnected, if they are connected to someone that is linked with existing individuals.

Is there any other way?

If the above is impossibly difficult, there's an alternative: open an Issue with the information that you would like to add/change. This can take more time, but eventually it should make its way to the database, after someone picks it up and makes the corresponding changes.

To open an issue, find the "Issues" tab and create a new one.

The importance of sources

Every entry should have one or more sources that allows anyone to determine from where the information is derived. This might seem overkill when thinking on well-known aspects of well-known people, but it's very important to be consistent about it: the reality is that our knowledge is often based on a mix of myths, half-truths, unconfirmed events, and partial understanding of real events. By explicitly adding sources, we can at least clearly identify the origin of the information.

Not all sources have the same status: a random comment in an internet forum, by an anonymous user that doesn't state the origin of the information, is clearly less authoritative than a published paper that underwent peer review. This is not to say that one is right and the other is wrong: merely that, faced with conflicting sources, those that identify their one sources carry more weight.

Sources and lineages

How should we reflect conflicting information about lineage-related aspects? Should we show only paths that can be shown to have sufficient backing? How to determine what type of sources are acceptable?

The approach we took was to:

This is done through the use of a quality field in the teachers section. This is inspired by the GEDCOM standard used in genealogy, and used precisely for the same reason.

There is always a degree of relativity in determining the "quality" of sources, but these guidelines should be good enough to start with.

Some examples:

Settling disputes

The more people participate in the project, the better, but also the more likely it is that different perspectives on what should be added to the database exist.

The general approach will be to have (public) discussions on the relevant topic, in the form of an Issue. The overall quality of the database is an important goal, so if needed, the project lead will determine the final outcome in the case of no consensus.

Having an extensive database is good, but it is not as important as having a quality database that clearly indicates the sources and how they are used. Making another comparison with genealogy, the Internet is filled with "family trees" that go back centuries, build by people that, in their desire to have a long ancestry line, import other trees that are built with similar carelessness. The information appears impressive, but a superficial look into it shows that most of it is false, untrackable, unproven.

FAQ

How to deal with broken links?

Sources and photos use URLs, which are external to the project. Especially in terms of the sources, this is unavoidable since the goal is to clearly point to where things come from, and most of the times this means an URL (although not always, since we can also add URIs that are not URLs, like for example ISBN numbers).

This leads to the possibility of "link rot", which is when the URLs we use stop working. This should be fixed since it's important to keep information available (and in the case of photos, it's even more visible).

To help with this, we regularly collect and scan all the links in the YAML files (using Linkinator, and any broken links are added to the report so that they can be fixed:

The process is straightforward and it should be easy to act upon the findings; to fix broken links the following is recommended:

Technology

Why Git/GitHub

Using git, and GitHub, is one of the core concepts in this project because it addresses several requirements that would otherwise require specific solutions:

While there is a learning curve to using it, the advantages overwhelmingly compensate them. The last point, specifically, means that data here will be always availabile, and can at any time be forked ("copied") by anyone. Those that remember the amount of information lost with the demise of judoforum.com will understand why this is not irrelevant.

Why YAML

I wanted something that would be easy to edit be humans, without special tools, while providing enough structure to be easily parsed. YAML is one of the most obvious choices for this.

Why Javascript/Bulma/Cytoscape.js/...

Most technological choices were made because they seemed to be the best for the domain, and also because they appeared to be simple enough to get started:

The core of the project is the database, and that is ultimately resilient to changes given that it only depends on the YAML format, but the more visible part of it is the web application; the technology used can be changed if need be, with Cytoscape.js and node.js being the ones that constitute the core that will almost surely remain.

YAML format

Martial arts, styles, sports: what terminology is used

The YAML format is a work-in-progress and not written in stone, but some of the terms were a compromise between those that would be more correct to a specific situation, and those that had a wider scope. An example of this is style, which will be applied to anything from Kodokan Judo to Catch-as-Catch-Can. As we progress, improvements in terminology can be made.

Is everything mandatory? Is everything optional?

A YAML schema will be available Real Soon Now, but the only mandatory fields are:

Not mandatory (but almost) are nationality/place of birth/native name, in the sense that they are usually easy to add and can be used for visualisation purposes. The more information, the better.

What are the IDs? Should I add them?

This is a work-in-progress that will change in the short-term, but for now the IDs are just incrementally generated "by hand". When adding something new, use existing IDs (of existing persons/styles) when possible, and leave them blank for the new individual, the ID will be added after.

How can we represent a single teacher teaching different styles to the same student?

This is done by adding a new entry to teachers, with the same id, but different style_id, for example:

  teachers:
    - id: JDP-26            # Tomiki Kenji
      style_id: JDP-S-4     # Taught Aikido
      place:
      period:
        start:
        end:
    (...)
    - id: JDP-26            # Tomiki Kenji
      style_id: JDP-S-1     # Taught Judo
      place:
      period:
        start:
        end:
    (...)

This will create two separate lines from teacher to student, and will keep all information about that relation specific to it (different locations, time periods, sources, etc.)

(see issue #35 for a discussion.)

Can we link directly to an individual?

Yes, all individuals can be selected through the use of URL query parameters, e.g. https;//budotree.judoc.org/?id=JDP-1, by using the id. This can easily be obtained using the information box footer button with a link symbol, and used to share links to specific persons.

Is there anything more we can set through the URL?

The following options can be set as query parameters:

They can be combined, for example: https://budotree.judoc.org?id=JDP-12&infobox=visible&focus=true&layout=mrtree&lang=ja.

Why is rank separate from teachers?

Different martial arts have different approaches: some have a teacher-student relation that includes rank, while others have central organisations that bestow rank. As such, learning from someone is not always the same as receiving rank from someone, even if it's the teaching that contributes to the body of knowledge. Separating rank from teachers allows to keep track of the teacher-student relationship without making assumptions about rank.

Shouldn't sources be attached to a specific section (teachers, rank), instead of being applied to everything?

Yes, this is likely a better idea. Currently, sources are at the "root" level to keep things simple: it's already several orders of magnitude better to have sources listed, and enforce that practice.

Following the genealogy research parallel, it would be better to have sources that can be attached to a specific assertion:

A given source may be the basis for many different assertions. Thus, much of the information is the same for many different citations of that source, such as the publisher information; and yet, some of the information varies from one citation to the next, such as the page number for a specific item. Consequently, the SOURCE_STRUCTURE includes a sophisticated mechanism for sharing general source description information that is common across multiple citations, while at the same time allowing more specific information to be more directly associated with individual citations. All tags within the SOURCE_STRUCTURE participate in this approach.

We need to balance how to do this with keeping it simple enough - as simple as it can be, but not simpler. One way to do it would be:

  1. Add source fields in the specific section (e.g. teachers->[id=<ID of Teacher 1>->source).
  2. Use the uri as the source ID, which would then point to a more complete entry for the source, in a separate YAML, with name, etc.
  3. Add a source->pageor source->citation field.

We will implement the first shortly enough; the second is an open discussion, and the third will depend on how much this becomes a real issue.