ericleasemorgan / reader

Distant Reader, a tool for using & understanding a corpus
GNU General Public License v2.0
20 stars 7 forks source link

Can't create new carrel #108

Closed nkmeyers closed 4 years ago

nkmeyers commented 4 years ago

Tried to make a carrel but it didn't queue?

http://localhost:8080/search2queue.cgi?shortname=Antibody+&email=natalie.meyers%40nd.edu&query=%28covid*+AND+antibod*%29+AND+%28year%3A%222020%22+OR+year%3A%222019%22%29&queue=Queue+%28Step+%232+of+2%29

image

ericleasemorgan commented 4 years ago

Please try again.

Yesterday I enhanced an enhancement by Don, and along the way I broke things. I believe those broken things have been fixed. Please give it another go, but at first, in the name of immediate gratification, give it a go with a small carrel.

nkmeyers commented 4 years ago

FYI queued three jobs: 2597, 2599, 2605 through the gui at dr-cov19-web all ran under username emorgan the job 2599 making carrel BCG20200621 was running too long and erroring like this below so I cancelled it. cord-36555-23xbwy62 sed: -e expression #1, char 14: unknown option to s' sed: -e expression #1, char 14: unknown option tos' sed: -e expression #1, char 14: unknown option to s' sed: -e expression #1, char 14: unknown option tos' sed: -e expression #1, char 14: unknown option to s' sed: -e expression #1, char 14: unknown option tos' cord-35717-bn15006x sed: -e expression #1, char 14: unknown option to s' sed: -e expression #1, char 14: unknown option tos' . . . ditto for 2597 - cancelled that too.

I had hope for the cardiac arrest carrel job 2605 : This one was to create a carrel of only ~228 items - but it ran 6 hrs 21 min and then was erroring like this so I cancelled it too : cord-78813-sm2guhp7 sed: -e expression #1, char 14: unknown option to s' sed: -e expression #1, char 14: unknown option tos' sed: -e expression #1, char 14: unknown option to s' sed: -e expression #1, char 14: unknown option tos' cord-99913-g9j53fl0 cord-86560-f1br2h6p cord-44199-z0cy49tu cord-116134-bonq48bq cord-99758-idhvf9vr

nkmeyers commented 4 years ago

I created a single item carrel so we can see how long it takes to run and how many errors a single item query spawns.

Job number 2608 name = SingleItemCarrel Query -> bat AND year:"2019" AND entity:"cats" AND entity:"rats" AND keywords:"monkey"

took 17:21 minutes to run.

@ericleasemorgan can you check /export/reader/carrels/SingleItemCarrel/standard-error.txt

and

https://github.com/ericleasemorgan/reader/blob/8d3a9bb4e84a7da025b59fcee32144ad948b8ee5/bin/json2txt-pdf.sh where . sed s"

?

there are some other warnings in the error file but maybe if this is one of the scripts creating the processing error we can reduce time to build carrels?

ericleasemorgan commented 4 years ago

I don't know what is going on but the carrel created:

https://cord.distantreader.org/carrels/SingleItemCarrel/

The process took 3 minutes.

It takes 60 seconds for the cron job to look for more work to do. Depending on the number of items in the carrel, it takes 10 second to a few minutes to copy the necessary data to the carrel. It takes another 120 seconds for one of our virtual machines to spin up. It takes anywhere from a couple of minutes to many hours for the carrel to actually be created. Your single item carrel took 3 minutes, but most of that was computer warm-up time. I have a 10,000 item and a 20,000 item carrels currently cooking. They have spent close to 17 hours of time.

For your workshop, I suggest carrel between a couple of dozen to about 100 items in size, and processing will take less than 30 minutes. Also, keep in mind, at the present time, we only have three virtual machines actually doing the work. If four people submit jobs, then the last submitter will have to wait until one of the first three are done.

This morning, I created a carrel with 137 items in it, and it took forty minutes to complete.

'Off to manage GitHub stuff...

-- Eric

nkmeyers commented 4 years ago

re-ran as job 2630 create carrel from query "bat AND year:"2019" AND entity:"cats" AND entity:"rats" AND keywords:"monkey"

good news- took only 4:06 min - much better than yesterday's 17:21 BTW No sed s" errors this time, hurray