ericleasemorgan / reader

Distant Reader, a tool for using & understanding a corpus
GNU General Public License v2.0
20 stars 7 forks source link

Enhance the search page so it includes additional fields and results are optionally displayed in a tabular form #95

Open ericleasemorgan opened 4 years ago

ericleasemorgan commented 4 years ago

The script reader:/bin/carrel2search.pl is used to create a study carrel's search interface (./htm/search.htm). It does this by querying the study carrel's database, updates a hash, transformes the hash into a JSON stream, opens a template file (reader:/etc/template-search.htm), replaces a token in the template with the JSON, and sends the resulting HTML to STDOUT.

The first part of this issue is to edit reader:/bin/carrel2search.pl and enhance the JSON stream to include additional fields from the carrel's database bib table. (The bib table is documented in reader:/etc/reader.sql). Start small; make your life easy. Include fields such as flesch, date, and words. At a later date, there will be additional fields such as URL and DOI.

For testing purposes, you ought to be able to simply run carrel2search.pl and examine the output:

$ /export/reader/bin/carrel2search.pl tiny-carrel | less

Once the JSON has been enhanced, edit the template so search results display the additional fields, and please have the results display in a tagged format, such as the following:

The Source of COVID-19 o abstract: This article described whence and where COVID-19 o author(s): Smith, Jones; Edwards, Robert o readability: 89 o length in words: 400 o date: 2019-08-10

For extra credit, create a new link at the top of a search result page with a label like "View as table". Once clicked, the page ought to return a tabled search results where each row is found result and each column is things like identifier, date, flesch, length, etc. The tabled results are expected to use a Javascript library called "DataTables", and DataTables is already used through the carrels. For example, see any carrel's bibliographics file (./htm/bibliographics.htm).

mcarro10 commented 4 years ago

edits.zip attached(changed) files: etc/template-search.htm , bin/carrel2search.pl , bin/tsv2htm-search.py (tsv2htm-search.py is essentially the same as tsv2htm-bibliographics.py but changed for filling tables on search page)

ericleasemorgan commented 4 years ago

At first glance, this looks like it will work, mostly. I did see a thing in one of the scripts where a variable was initialed twice:

my $words = $$bibliographics{ 'words' }; my $words = $$bibliographics{ 'author' };

But that is a small issue.

'More later.

ericleasemorgan commented 4 years ago

I'm sorry, Mia, but I am unable to get this to 100% go. How am I suppose to use tsv2htm-search.py?

Here's what I did. First I backed up both carrel2search.pl and template-search.htm

Next, I installed your versions of carrel2search.pl and template-search.htm

Third, I tweaked carrel2search.pl so it would both run as well as output additional fields like flesch, author, and words.

Fourth, using a carrel named test-tissues and the following commands, I re-created a search page:

ssh 149.165.170.42 cd /export/reader/carrels/test-tissues ../../bin/carrel2search.pl test-tissues > ./htm/search.htm

When I did a search nothing was returned. I perused the Javascript error log and noticed how a variable (flesch) was not found. Looking at the template, I noticed how some of the variables, like flesch, where not quoted.

I tweaked the template by quoting the variables and I recreated the search page. This time, when I searched, I got output, but the value of all the authors were "$author". Looking at the Perl code I noticed something was quoted when it should not have been. I tweaked and re-created the search page. Result returned as expected.

Finally, I was not able to do View As List. I believe this did not work because I did not know how to employ tsv2htm-search.py and the fact that there is a token called ##ROWS## which has not been replaced with values.

Try the following URL to see your current implementation:

https://cord.distantreader.org/carrels/test-tissues/htm/search.htm

Attached are my modified versions of carrel2search.pl and template-search.htm.

Please tell me, how is tsv2htm-search.py expected to be used? What is its input?

-- Eric M.

carrel2search.txt

template-search.txt

mcarro10 commented 4 years ago

Hi Eric,

I am so sorry for all of this confusion. I had had a local directory that I was working in and I think that I ended up submitting the older versions from the local directory.

I run tsv2htm-search.py with the argument: [carrel name]/tsv/bibliographics.tsv.

So for example, I would run:

./ tsv2htm-search.py [carrelname]/tsv/bibliographics.tsv > filename.htm

Hopefully this helps.

I apologize again for all of this confusion.

On Tue, Jun 23, 2020 at 3:21 PM Eric Lease Morgan notifications@github.com wrote:

I'm sorry, Mia, but I am unable to get this to 100% go. How am I suppose to use tsv2htm-search.py?

Here's what I did. First I backed up both carrel2search.pl and template-search.htm

Next, I installed your versions of carrel2search.pl and template-search.htm

Third, I tweaked carrel2search.pl so it would both run as well as output additional fields like flesch, author, and words.

Fourth, using a carrel named test-tissues and the following commands, I re-created a search page:

ssh 149.165.170.42 cd /export/reader/carrels/test-tissues ../../bin/carrel2search.pl test-tissues > ./htm/search.htm

When I did a search nothing was returned. I perused the Javascript error log and noticed how a variable (flesch) was not found. Looking at the template, I noticed how some of the variables, like flesch, where not quoted.

I tweaked the template by quoting the variables and I recreated the search page. This time, when I searched, I got output, but the value of all the authors were "$author". Looking at the Perl code I noticed something was quoted when it should not have been. I tweaked and re-created the search page. Result returned as expected.

Finally, I was not able to do View As List. I believe this did not work because I did not know how to employ tsv2htm-search.py and the fact that there is a token called ##ROWS## which has not been replaced with values.

Try the following URL to see your current implementation:

https://cord.distantreader.org/carrels/test-tissues/htm/search.htm

Attached are my modified versions of carrel2search.pl and template-search.htm.

Please tell me, how is tsv2htm-search.py expected to be used? What is its input?

-- Eric M.

carrel2search.txt https://github.com/ericleasemorgan/reader/files/4821439/carrel2search.txt

template-search.txt https://github.com/ericleasemorgan/reader/files/4821440/template-search.txt

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ericleasemorgan/reader/issues/95#issuecomment-648368071, or unsubscribe https://github.com/notifications/unsubscribe-auth/APXW4JHEDLB5HB3GRUO7FMTRYD6C7ANCNFSM4NYQY22Q .

ericleasemorgan commented 4 years ago

I think our tasks are getting conflated. Let's work on just one at a time. Please send me the files to re-implement the search interface. Okay?

mcarro10 commented 4 years ago

edits.zip

searchsub.zip Ok I think this might be a little more clear. In the edits folder are the files necessary to implement the interface. In the other folder, are results from running the scripts .

mcarro10 commented 4 years ago

I think I realized something that might be contributing to all this confusion. In the tsv2htm-search.py file, the TEMPLATE variable at the top should be the file where the output of running carrel2search.pl is stored. (So that the template has the ##JSON## token replaced). Then, since bibliographics.tsv had all the necessary data to fill this table, I just used that tsv file as the input to ./tsv2htm-search.py . I should have said this sooner, sorry. Also, I am realizing that it would probably be better to make TEMPLATE an argument to the executable.

ericleasemorgan commented 4 years ago

Thank you for the clarification, and I have gotten your scripts to function, but the output is not really what is desired.

To get your efforts to function I first saved the scripts in the /export/reader/bin directory and the template in the /export/reader/etc directory. I also create a temporary directory /export/reader/tmp. I then changed directories to a carrel, and for a good time the carrel I used is /export/reader/carrels/test-tissues. I then ran the following command to create your intermediary template:

$ ../../bin/carrel2search.pl test-tissues > /export/reader/tmp/search.htm

This results in the replacement of the ##JSON## token.

I then edited tsv2htm-search.py so its template value is the output of carrel2search.pl.

I then ran the following command to replace the value of the ##ROWS## token:

$ ../../bin/tsv2htm-search.py ./tsv/bibliographics.tsv > ./htm/search.htm

This results in a functional search page, and you can see the results of these effort, here:

https://cord.distantreader.org/carrels/test-tissues/htm/search.htm

From there I can search for something like "lung", and get results. Yeah!

I can then select "table" mode, and I get a table, but I also get many additional listed items. In short, I got your good work to... work, but it not the output I am expecting. I think I need to clarify, but when writing everything down in email things get lost in translation.

Maybe we can talk on the telephone?

mcarro10 commented 4 years ago

Ok, I see! I can definitely talk on the phone today. I am actually leaving this afternoon to see my grandparents so I will likely not be able to attend the meeting, but I am definitely available to talk on the phone all day today or whenever works for you. My number is : 914-844-2693. In the meantime, I did the changes to the three bibliographies table, and just uploaded those. I am starting the next part, allowing for additional viewing options, and will hopefully be done with those early next week.

On Fri, Jun 26, 2020 at 8:54 AM Eric Lease Morgan notifications@github.com wrote:

Thank you for the clarification, and I have gotten your scripts to function, but the output is not really what is desired.

To get your efforts to function I first saved the scripts in the /export/reader/bin directory and the template in the /export/reader/etc directory. I also create a temporary directory /export/reader/tmp. I then changed directories to a carrel, and for a good time the carrel I used is /export/reader/carrels/test-tissues. I then ran the following command to create your intermediary template:

$ ../../bin/carrel2search.pl test-tissues > /export/reader/tmp/search.htm

This results in the replacement of the ##JSON## token.

I then edited tsv2htm-search.py so its template value is the output of carrel2search.pl.

I then ran the following command to replace the value of the ##ROWS## token:

$ ../../bin/tsv2htm-search.py ./tsv/bibliographics.tsv > ./htm/search.htm

This results in a functional search page, and you can see the results of these effort, here:

https://cord.distantreader.org/carrels/test-tissues/htm/search.htm

From there I can search for something like "lung", and get results. Yeah!

I can then select "table" mode, and I get a table, but I also get many additional listed items. In short, I got your good work to... work, but it not the output I am expecting. I think I need to clarify, but when writing everything down in email things get lost in translation.

Maybe we can talk on the telephone?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ericleasemorgan/reader/issues/95#issuecomment-650163402, or unsubscribe https://github.com/notifications/unsubscribe-auth/APXW4JHTCBJ5N7DQQ2IXFWTRYSLCDANCNFSM4NYQY22Q .

mcarro10 commented 4 years ago

Modified search page so that the table now responds to the list’s search bar, and the table and list now retain the data from each other after searches. This seems to work every time after the first search, which is a bug that I will hopefully be able to figure out by tomorrow. The other issue right now is that the table’s original search bar (the one without the button) is still visible in the upper right hand corner, and I am still working on a way to work around this! searchedits.zip

mcarro10 commented 4 years ago

searchedits.zip Slightly improved - fixed the the problem with the first search, so results should now work for all searches in table and list (still have the extra search field)

ericleasemorgan commented 4 years ago

Mia, all of is definitely a step in the right direction, and because it does not output invalid data, I will put it into production, but it still needs work. You can see an implementation here:

https://cord.distantreader.org/carrels/test-tissues/htm/search.htm

Search for something like "expressed".

To put it another way, you have made the whole thing MUCH more functional than my original implementation. Thank you. But it still suffers from a great deal of usability issues. You know what those issues are.

Please continue your development?

mcarro10 commented 4 years ago

Sounds good. I will continue to work on this!

On Fri, Jul 10, 2020 at 11:33 AM Eric Lease Morgan notifications@github.com wrote:

Mia, all of is definitely a step in the right direction, and because it does not output invalid data, I will put it into production, but it still needs work. You can see an implementation here:

https://cord.distantreader.org/carrels/test-tissues/htm/search.htm

Search for something like "expressed".

To put it another way, you have made the whole thing MUCH more functional than my original implementation. Thank you. But it still suffers from a great deal of usability issues. You know what those issues are.

Please continue your development?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ericleasemorgan/reader/issues/95#issuecomment-656737788, or unsubscribe https://github.com/notifications/unsubscribe-auth/APXW4JHEHAF7DERV4SIFKTDR24YDVANCNFSM4NYQY22Q .

ericleasemorgan commented 4 years ago

Sounds good. I will continue to work on this!

Thank you, times two. --Eric

mcarro10 commented 4 years ago

searchupdates.zip Improved search page - one search bar, and improvements in the table

ericleasemorgan commented 4 years ago

Thank you for giving it a go, but I was not really able to make it.... go.

I replaced the existing files with the files you sent me. I then recreated a search page. There was some sort of error reading the data. No result where listed in list format. Table view displayed, but the buttons do nothing. :-(

Attached ought to be a screen dump of the Javascript console. 'Make sense?

ericleasemorgan commented 4 years ago
errors
mcarro10 commented 4 years ago

searchupdates.zip I'm sorry - there was an error in my parsing function. Sorry about this. It should run now I think

ericleasemorgan commented 4 years ago

Very functional. For example, try:

I will add these to the repository. And for right now, you have earned a rest. Thank you.

mcarro10 commented 4 years ago

Sounds good - thank you!!

ericleasemorgan commented 4 years ago

Hmmm... Have you rested? ;-)

I have added three new fields to ./tsv/bibliographics.tsv, and they include: abstract, url, and doi. Attached ought to be a small sample file. In your copious spare time, please enhance both your search and bibliographics systems to:

1) display DOI 2) use abstract instead of summary, when abstract is available 3) link to URL instead of .txt, when URL is available

Okay?

ericleasemorgan commented 4 years ago

bibliographics.txt