ericleasemorgan / reader

Distant Reader, a tool for using & understanding a corpus
GNU General Public License v2.0
20 stars 7 forks source link

add-context.pl need to sort out permissions for running #107

Open nkmeyers opened 4 years ago

nkmeyers commented 4 years ago

Can you help me walk through a problem I've encountered where add-context.pl is not working how I’d expect ? My Carrel is /export/reader/carrels/BCGVaccine

I wanted to change the content of the carrel's "home page" So, I Made a contexts file to go with it. File I made: /export/cord/etc/contexts/BCGVaccine.txt

then I wanted to enhance the carrel using that content so I cd to : /export/cord where I figured I'd run: sudo /export/cord/bin/add-context.pl carrelname That outputs to the screen, If you “like” the output, then you really write it to a file by doing this: /export/cord/bin/add-context.pl > /export/reader/carrels//index.htm so I tried to run it as: sudo /export/cord/bin/add-context.pl BCGVaccine > /export/reader/carrels/BCGVaccine/index.html but even if sudo I got permission denied on the filesystem unless I try to write to .htm instead of .html ? I have no clue why. So then I tried this: sudo /export/cord/bin/add-context.pl BCGVaccine > /export/reader/carrels/BCGVaccine/index.htm but then the carrel’s index.htm just zero'd out to empty :weary: (edited)

ericleasemorgan commented 4 years ago

This is just a test. Where does this email message go? --ELM

ericleasemorgan commented 4 years ago

Attached ought to be my second attempt a carrel creation tutorial, and it includes "extra credit" instructions for adding context. 'More later.

How to create a study carrel: Tutorial # 3

This is a recipe for study carrel creation. Like any recipe it is really an outline of what needs to be done.

First, create an SSH public key and send it to Eric C., who will create an account on our system.

Second, log into our HTTP host making sure to map TCP ports along the way. People who use PuTTY as their ssh client ought to be able to open their terminal ("cmd") and issue the following command:

putty -L localhost:8080:localhost:8080 149.165.170.150

People who use a Macintosh (or Linux or the absolute latest & greatest version of Windows) ought to be able to open their terminal ("Terminal") and issue the following command:

ssh -L localhost:8080:localhost:8080 149.165.170.150

Third, use a Web browser to search our Solr index at http://localhost:8080. Use the words AND, OR, or NOT to create Boolean queries. There are a number of fields available for searching. They include: title, authors, year, entity, type, keywords, and journal. One can also exploit stemming through the use of an asterisk ("*"). Example queries include:

* foo
* foo bar
* "foo bar"
* baz*

* foo AND bar
* foo OR bar
* (foo OR bar) NOT baz

* title:foo
* year:bar
* keywords:baz

Here's a helpful hint: search for everything with simply the asterisk symbol; submit the query "*", sans the quote marks. The result will be the totality of records in the system. The searcher can then use the resulting hyperlinked facets to narrow the results.

Fourth, once a satisfying set of results has been created, use the "queue the creation of a study carrel" link to initialize a study carrel. In the resulting HTML form supply a name of the carrel and make absolutely sure the name is a single "word", where a word contains upper or lower-case letters, numbers, and/or characters such as "-" or "_", and zero spaces. Example carrel names ("words") include:

Make your life easy. Use lowercase letters, numbers, and the dash character ("-").

If a carrel already exists with the name one enters, then the existing carrel will be overwritten. This is by design and considered a good thing. If people want to keep their carrels separate from other people's, then consider prefixing carrel names with initials. An example might be elm-medical-care. People ought to feel free to create as many carrels as they desire, even if the content test-like; if carrels are intended as tests, then consider prefixing carrels with the word... test. An example might be test-medical-care.

Fifth, wait. If nothing goes wrong, then the carrel will be initialized in 60 seconds. After another 120 seconds (or so), the carrel will begin to be processed. Continually reload the URL returned by the Step # 4, and one ought see changes. If no changes are seen after 5 minutes, then call Eric. Otherwise, wait. The carrel is building, and depending on the number of things in the carrel, the building proces will require between 15 minutes and many hours. Again, one can monitor progress by continually reloading the URL returned by Step # 4. To get more detail, consider "drilling down" the multitude of directories in the study carrel. If you REALLY want to see what is going on, then open your study carrel's "standard-error.txt" file, and you will get a step-by-step rendition of the building process.

Sixth, your study carrel is finished building when you load the URL from Step # 4 and you see an HTML page. Congratulations, you're done. If you get a blank page, then carrel is almost done. Wait some more. If an HTML page never returns, then: 1) load the log file (standard-error.txt), and 2) call Eric. What's really cool about this result is two-fold. First, you can share the URL with your friends, colleagues, etc. Second, you will be able to download the whole carrel and open it on your computer where it will be 100% functional. As a bonus, you could save the carrel on a different Web server, and it will be 100% functional there as well.

Adding context

For extra credit (think, "icing on the cake"), one can add context (a title, a scope note, provenance, a date, and authorship) to a study carrel. This is an advanced technique and it is still in development. Here's how:

  1. Ssh to 149.165.170.150.

  2. Duplicate any of the files in /export/cord/etc/contexts making sure the duplicate file has the same name as the study carrel in question.

  3. Edit the duplicated file and give values to each of the named fields (LONGNAME, SCOPENOTE, CREATOR, EMAIL, CREATIONDATE). Keep the values VERY simple, and make sure each name-value pair is delimited by a tab character.

  4. Test your edits by first navigating to the cord directory (cd /export/cord/) and then running ./bin/add-context.pl where

    is the name of the study carrel. The result will be a stream of HTML. Peruse the HTML. Is it what you desire? If not, then go to Step # 3. If so, then continue.
  5. Implement your edits by redirecting the resulting HTML to a file, like this: ./bin/add-context.pl > /export/reader/carrels//index.html where is the name of the study carrel.

By this point one ought to be able to open up the study carrel's URL, and the context will seen at the top of the resulting page.

"Fun with distant reading!"

-- Eric Lease Morgan emorgan@nd.edu June 20, 2020 (First Day of Summer)

ericleasemorgan commented 4 years ago

Wiped out your index.htm page? Bummer!!

If this is true, then this is because you redirected the output of ./bin/add-context.pl to index.htm instead of index.html. Your index.htm file has gone to Big Byte Heaven and must be recreated. The easiest and safest way to resolve this issue is to:

1. cd /export/reader/carrels/BCGVaccine
2. sbatch make-carrel.slurm
3. wait until the carrel is re-built
4. re-run add-context.pl making sure to redirect the output to index.html

An advanced technique is to:

1. salloc
2. wait for a node to warm up
3. ssh compute-0 (or whatever the name of the node is)
4. cd /export/reader/carrels/BCGVaccine
5. /export/reader/bin/carrel2about.py > index.htm
6. wait
7. when you get your prompt back, exit
8. re-run add-context.pl making sure to redirect the output to index.html

The first technique is easy and straightforward. The second technique is faster but more techno-weenie.

ericleasemorgan commented 4 years ago

P.S. If you get permission errors, then use sudo in conjunction with add-context.pl, but the real solution is to change the permissions of the study carrels so our group can more easily redirect output. If problem still persist, then the problem is probably with our shared file system.

The real real solution is to amalgamate our CORD and Reader repositories, and that is in the works.

nkmeyers commented 4 years ago

Cool, I'll try again following the steps above either sometime today after a boat ride and the Belmont or tomorrow and share my antibody and BCG carrels if they are a success!

nkmeyers commented 4 years ago

@ericleasemorgan Kicked off job 2600 using sbatch make-carrel.slurm at /export/reader/carrels/BCGVaccine but looks like it queued and then barfed?

So, Tried another one, job 2601 using sudo sbatch make-carrel.slurm at /export/reader/carrels/humidity . . . that one made it to big-compute-2 and it completed at june 21, 17:16 and wrote a new sutdy-carrel.zip file but its index.htm is EMPTY at /export/reader/carrels/humidity ? And the tsvs were all created but they're empty too just header row and no content.

So trying job 2604 at /export/reader/carrels/BCGVaccine to see if i get an index.htm with any content at /export/reader/carrels/BCGVaccine from that job.

nkmeyers commented 4 years ago

I was able to add-context to the SusceptibilityGender2 carrel (didn't do make-carrel.slurm) but was able to run add-context (did need to relax permissions on directory first, couldn't write to index.html, but could write to home.html and could then cp that to index.html ) .

I did reset the permissions back the way the others appeared to be afterward, but please check the wx filesystem permissions for the the carrel dir SusceptibilityGender2 and its files.

ericleasemorgan commented 4 years ago

Yes, this is probably the case for everybody. There are two possible solutions: 1) run the carrel creation process as a generic user and explicitly make the content the generic user readable by a group, or 2) run some sort of chown command over the whole of a just before it is finished.