ericleasemorgan / reader

Distant Reader, a tool for using & understanding a corpus
GNU General Public License v2.0
20 stars 7 forks source link

Enhance bibliographic display and functionality #99

Closed ericleasemorgan closed 4 years ago

ericleasemorgan commented 4 years ago

At its core, the Reader ingests unstructured textual data and outputs a set of structured data. In our case, the textual data comes from scholarly journal articles. Thus, the Reader's input comes with bibliographic characteristics: authors, titles, dates, abstracts, DOIs, URLs, permissions (licenses), etc. Given a journal article, we are able to compute additional bibliographics such as but not limited to: keywords, readability scores, extent (size of document measured in words or sentences), sentiment, etc.

A carrel's bibliographic information is made accessible via an HTML file called "./htm/bibliographics.htm", for example:

https://cord.distantreader.org/carrels/tiny-carrel/htm/bibliographics.htm

This file is created through the use of two scripts and one template:

  1. reader:/bin/db2tsv-bibliographics.py
  2. reader:/bin/tsv2htm-bibliographics.py
  3. reader:/etc/tsv2htm-bibliographics.htm

The first script does the hard work, namely dumping all the desired bibliographic information to a tab-delimited file. The second script reads the tab-delimited file, creates a set of HTML tr/td combinations for each record, opens the template, replaces a token found in the template with the combinations, and sends the resulting HTML to STDOUT.

Your mission, if you choose to accept it, it to simplify/enhance ./htm/bibliographics.htm in three ways. First, remove the column named "pages" because our data has zero bits of pagination. Second, remove the column called "cache" because its existence significantly hinders usability. Third, hyperlink the values in the column labeled "text" so they point to the corresponding plain text files found in the carrel's ./txt directory. To do this work, you will only need to edit tsv2htm-bibliographics.py and tsv2htm-bibliographics.htm. As of right now, no editing against db2tsv-bibliographics.py is necessary.

Once this is done, additional fields may be available for inclusion in the table. Fields may include DOI, URL, etc. In addition, next steps will be to exploit the functionality of the underlying Javascript library (DataTables) for the purposes of filtering and sorting.

For extra credit, create a link at the top of ./htm/bibliographics.htm, and call it "View as bibliography". When the student, researcher, or scholar clicks the link, then the output will take the form of a narrative report -- a sort of annotated bibliography suitable for sharing or printing.

mcarro10 commented 4 years ago

bibchanges.zip changed files - reader:/etc/tsv2htm-bibliographics.htm, reader:/bin/tsv2htm-bibliographics.py python script changed so that it now requires two arguments (tsv file, path to carrel). I did this because it seemed like a simple way to have the path to a given carrel, and this is necessary to open the text files.

ericleasemorgan commented 4 years ago

I integrated your code into the system, and you can see an example of your good work, here:

https://cord.distantreader.org/carrels/test-tissues/htm/bibliographics.htm

Since we always know the location of the carrel in question, there is/was no need for the second command-line argument. I also removed some unnecessary required modules and well as changed the name of the hyperlink.

There are additional bibliographic items of interest, like URL and DOI. I will see about adding them to ./tsv/bibliographics. Once done, we can include them in ./htm/bibligraphics.htm as well.

mcarro10 commented 4 years ago

annotatedbib.zip I finally added the annotated bibliography option to the tsv2htm-bibliographics.htm file - sorry that this took so long. I think that it has basic information in a simple format, but I assume it should probably also include summaries. The way that I did it uses the data that is in the table so I am still trying to figure out how I would have summaries in the bibliography and not in the table. The other limitation of this approach is that if, for example, the user had selected 'view 10 items' in the table, then there will only be items in the bibliography (I think this is an easier thing to change).

ericleasemorgan commented 4 years ago

Very good. Much much closer. You can see an example of your good work here:

https://cord.distantreader.org/carrels/test-culver/htm/bibliographics.htm

In the near future I/it will supply you with additional bibliographic information, such as journal title, DOI, URL, and a computed summary/author abstract.

When it comes to the summaries -- a long string of data -- DataTables includes a cool feature which toggles the display of an additional field. See https://datatables.net/examples/api/row_details.html In our case, the additional field could be the summary/abstract.

I will add your template to the repository and close this ticket, but after I have added additional data to the ./tsv/bibliographics.tsv, I will create a new ticket so the additional data can be added. At that time, I hope you will be interested in doing additional enhancements.

Good job.

mcarro10 commented 4 years ago

Ok, great! I will definitely be interested in adding additional fields. Thank you!

On Fri, Jul 10, 2020 at 8:18 AM Eric Lease Morgan notifications@github.com wrote:

Very good. Much much closer. You can see an example of your good work here:

https://cord.distantreader.org/carrels/test-culver/htm/bibliographics.htm

In the near future I/it will supply you with additional bibliographic information, such as journal title, DOI, URL, and a computed summary/author abstract.

When it comes to the summaries -- a long string of data -- DataTables includes a cool feature which toggles the display of an additional field. See https://datatables.net/examples/api/row_details.html In our case, the additional field could be the summary/abstract.

I will add your template to the repository and close this ticket, but after I have added additional data to the ./tsv/bibliographics.tsv, I will create a new ticket so the additional data can be added. At that time, I hope you will be interested in doing additional enhancements.

Good job.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ericleasemorgan/reader/issues/99#issuecomment-656645964, or unsubscribe https://github.com/notifications/unsubscribe-auth/APXW4JECU4CL6MYSY4WLZRTR24BJNANCNFSM4NYYGPAA .