🏛️ How does column design work? Howe can we make it straightforward to design useful columns and validate them against shared robustness goals?

Column design is one of the most important aspects of research cartography, in short, we need to find good ways to establish measures we care about, to score them, and to validate them. The process of designing columns is heavily iterative, and interconnected with other columns and their performance, as well as with the research direction that is motivating the cartographic effort (and any evolution of that research direction during the mapping process).

So, how is this done now?

It appears each project has had somewhat different approaches to designing and refining columns. At a high level:

there is some process of determining a research requirement, that then motivates capturing a specific dimension, e.g., one aspect of a task taxonomy, or one property of a regression reported in a paper,
then a mechanism for scoring that dimension is proposed, e.g., a question that tries to quantify the task taxonomy feature,
then that mechanism is tested by taking scores on the dimension, e.g., by hiring turkers or getting RAs to fill in the column for some sample papers
then analysis is performed to check how reliable the column appears. This has been a bit fraught because there are many properties of columns that make the appropriate analysis unclear.
finally we determine if the process is good enough, or we return to an earlier step to either introduce more columns to get at the key issue, or to refine the specific process for the column in question.

Each step has opportunities for researcher degrees of freedom, and we would ideally like to make it possible to reduce those as much as possible. The validation and finalization steps are most critical here, because they determine when something will become a core part of our data. If we do those badly, we get bad data.

A related challenge is that even when validating correctly, we can also suffer from overfitting to the sample that was tested. So another consideration in this process it ensuring that the sampled stimuli (e.g., papers) are sufficiently distributed over the domain of candidates to be a good test of the measure.

Many open questions exist in this process but any thoughts from those actively mapping now would be very helpful. Also, if it's unclear what this is about, reading #1 might help, and asking questions in the comments below is always encouraged!

As a related question: what is a column?, or perhaps more specifically, what specifies a column?

Speaking with @linneagandhi, we agreed that at least a few things are required:

name
unit
description with examples etc.
data options, type and validation

Some columns are dependent on others, or are functionally drivable from others.

The goal is a "description that leads to a reliable response", and this of course means that they are often iterated upon and require some type of measure of reliability. Reliability is tricky, as there is not a good standard that works across all data types. Additionally when developing columns it appears that free text response is a needed first step so that we can gain an understanding of the scope of the column. This further challenges reliability and motivates a human review cycle before making higher-level determinations about how data can be validated at a unit level or in aggregate.

However, this does suggest that columns could be summarized in a tidy format such as the following:

name	unit	description	data
doi	paper	What is the DOI of the paper?	DOI (a subset of URI?)
conditions	experiment	What conditions did the experiment have?	free_text list

(Of course, descriptions are probably much more sophisticated than this example)

Further, we may have more aspects to this specification around validation, aggregation, conceptual source, rating mechanism etc. And I could imagine those would all be features in this tidy specification of the set of columns (more columns about columns).

Because our specification encompasses evolution and iterative improvement, we would want to store version information, perhaps as a GitHub blob or something else that formally identifies the current column among all columns.

A further note from discussion with @linneagandhi is that columns are often created in groups, or relation to other columns. For example you might have a set of columns about how results are reported that are quite intertwined, i.e., if one is true others are by definition NA or have a required value. That kind of a relationship is a little tricky to express in a tidy way, especially as things evolve, so I need to think more about if column clustering should be formal or informal, or formalized in a higher-level abstraction, e.g., concepts.

Another question that came up in discussion with @xehu is something like "which columns matter when?"

In her case, she is taking in free text versions of several columns that are not required as machine readable at this time, e.g., context where the response might be something like:

34 teams of 4 people based in a bike based tulip delivery startup in the Netherlands

This is in contrast to some of our other mapping efforts where the goal of commensurability has driven us to exhaustively decompose columns like context, e.g., into team_count, team_size, type_of_flower etc.

Of course, at a later time, the context column might turn into a series of more specific ones, but having this less formal column makes encoding easier and aspects of the final column design more effectively asynchronous — we hopefully have the data to make more formal columns with the informal one.

One design pressure or consideration here might be to make sure people:

make informal columns that are relatively commensurate (so that if two researchers have a similar column of this type
promote mechanistic processes for these types of columns, e.g., copy and paste a section of the paper that describes its context.
provide easy detailed version of columns that can enrich the data in realtime, e.g., make team_size a straightforward extension, so that the captured data can quickly be upscaled ether at the time of recording or when columns are further detailed

A related aspect reflected in this conversation was that a map may not be the desired output of a mapping process. In this case, the output is closer to a list of theory operationalizations within a certain domain. This is interesting because it effectively looks at only one dimension of the map at a time, which is not a view we have previously engaged with deeply (perhaps also relevant to views discussion #4).

Watts-Lab / atlas

🏛️ How does column design work? Howe can we make it straightforward to design useful columns and validate them against shared robustness goals? #7