ericleasemorgan / reader

Distant Reader, a tool for using & understanding a corpus
GNU General Public License v2.0
20 stars 7 forks source link

Combine reader-workshop and reader-extras to create reader-cord-toolbox #98

Closed ericleasemorgan closed 4 years ago

ericleasemorgan commented 4 years ago

The Reader creates a complex data set -- called a "study carrel" -- described in the following blog posting:

http://sites.nd.edu/emorgan/2019/12/reader-manifest/

For a number of reasons, these study carrels are only slightly interactive. To get the most out of one or more study carrels, the student, researcher, or scholar ought to import study carrel data into their own data analysis applications, or use tools specifically designed for such purposes. Put another way, it is not a trivial process to create a feature rich Web interface to the study carrels. This is true for at least two reason: 1) we dont' know the research questions to be answered, and 2) creating Web-interfaces is notoriously difficult, if not hazardous to the platform hosting a carrel.

On the other hand, because each study carrel has the same structure, command-line interfaces are rather trivial to create. Moreover, a command-line tool will work across study carrels, if not collections of study carrels. To these ends two additional Reader repositories exist:

  1. reader-workshop - https://github.com/ericleasemorgan/reader-workshop
  2. reader-extras - https://github.com/ericleasemorgan/reader-extras

The bulk of reader-workshop is a manual for using Distant Reader Classic, but it also contains a bunch o' scripts (written in different languages) to do interesting analysis against a study carrel. The second repository -- reader-extras -- is much like reader-workshop sans the manual. Using one or the other of the repositories, a person can glean all sort of things not accessible from the Web. Examples include:

Your mission, if you choose to accept it, is to:

  1. create a new repository called reader-cord-toolbox
  2. copy all the files in the aforementioned repositories to reader-cord-toolbox
  3. edit the newly created repository's README file to reflect the good work you have done

Once this task is done, the next steps will be to curate and enhance the content of the toolbox, and the results will work specifically with study carrels created from our CORD data set.

dbrower commented 4 years ago

It is possible to do this by merging the two repositories directly in git. However, a few files exist in both repositories. Should files from one be favored over files from the other? I can work through them if I know which repo has the newer versions.

Files that are in both repositories and are different:

A README.md
A bin/add-metadata.pl
A bin/carrel2diagram.sh
A bin/carrel2json.py
A bin/classify.pl
A bin/cluster.py
A bin/concordance.pl
A bin/db2malletcsv.sh
A bin/harvest.sh
A bin/list-questions.pl
A bin/list-questions.sh
A bin/word2hypernym.py
A etc/template-diagram.htm
ericleasemorgan commented 4 years ago

Should one repository be used over the other? Yes, we want to prioritize the content in reader-workshop.

dbrower commented 4 years ago

@ericleasemorgan I can't make a new repo in your github namespace. Could you make it and give me write access? Alternatively, I could consolidate both into one of the existing ones.

ericleasemorgan commented 4 years ago

Don, I have created a new repository -- reader-cord-toolbox, and I have granted you access to it. I believe you can now merge reader-workshop and reader-extras into reader-cord-toolbox? If so, and once this is done, the next step will be to curate and enhance the items in reader-cord-toolbox.

ericleasemorgan commented 4 years ago

Done. See https://github.com/ericleasemorgan/reader-cord-toolbox. Closing.