NaturalHistoryMuseum / scratchpads2

Scratchpads 2.0
http://scratchpads.org
GNU General Public License v2.0
199 stars 83 forks source link

Citations #4288

Closed informatics-dev closed 10 years ago

informatics-dev commented 11 years ago

Description:

The aim is to provide a way for users to cite a given page, such that:

The Scratchpads 1 implementation provides a block on the page which provides:

The cited, static version of the page is stored on the site itself.

Attachments:

informatics-dev commented 11 years ago

Comment by Alice Heaton

There are numerous paid services for this (as some countries' legal system requires particular organisations to archive their content) and a few free ones.

Of the free ones, the only one that specialises in scholarly citations is WebCite (http://webcitation.org). Its future is uncertain at the moment (they are currently raising funds to see if they can expand to accept new citations) - however of the free ones I have seen, it is the only one I would trust as reliable in the long term. The reason for this is that it has a particular purpose (ie. scholarly citations) and thus understands the need to ensure pages are kept for ever. It is also the only of the free ones that deals with issues such as copyright.

Webcite may or may not continue to accept new submissions after 2013 - but then it's hard to tell for any of the free ones how long they will last (eg. bo.lt and backup url I have seen mentioned and have already disappeared).

Some background: https://en.wikipedia.org/wiki/Web_archiving

informatics-dev commented 11 years ago

Comment by Alice Heaton

I would expect people who would use this type of functionality a lot already have it set up - many of the existing services provide 'bookmarklet' style functionality, to allow people to easily archive a given page.

We would be merely providing an in-build version of that bookmarklet, which would be useful for people who do not normally use such archiving services. The real added bonus is to generate the list of authors involved on a particular page, as the Scratchpads 1 version does.

We could also decide to provide our own service, specifically for Scratchpads. A simple version of this could use wkhtmltopdf to generate a PDF version of the page as it is at a given time.

I would think for the now the best plan is to:

If Webcite closes down, or we think it better to provide this service ourselves, we can change the backend with having to change everything else. The feature could even allow users to choose which backend they prefer, though I'd expect people who have a preference are people who already use an archiving service and would use that one anyway.

informatics-dev commented 11 years ago

Comment by Alice Heaton

So I'm afraid webcite is not very reliable at the moment. I tried yesterday afternoon to archive a page, and all I got was a MySql too many connections error. I've tried this morning again, and I'm getting the same error.

I do not have confidence in any of the other free ones, because they do not have a business model and they do not have a long term vision. If we are going to rely on a third party for long term archival, I would expect both. So most likely we are going to have to provide our own service.

informatics-dev commented 11 years ago

Comment by Alice Heaton

So leaving aside the archiving side of things, we still need to think about:

  1. How to display the cite-me link/button/block and associated information ;
  2. Which pages to display it on ;
  3. How to gather information about a page's authors.

For 2. I see that both Scratchpads 1 and antweb.org decided to only show the cite me button/block on selected pages. This definitely makes number 3 easier. The good approach then is probably to provides hooks so that any module can declare that they can parse the authors for a given path (it makes sense to have this work with paths rather than content types - because 1. we are citing a path, and 2. the same content will or will not pull other content depending on where it is displayed). This way we can work on selected pages at first, and then add more types of pages if needed as we go along.

This means that we need to answer '3' in specific contexts only:

For a given path we need the module to be able to say if it's a default handler or a specific handler. Default handlers would only get invoked if none of the specific handlers could parse the authors.

informatics-dev commented 11 years ago

Comment by Alice Heaton

So the final version has backends for webcite, archive.is, wget (local) and phantomjs (local).

By the default, phantomjs is the only enabled backend - it generates PDF files and PNG previews. At the moment only node/% and taxonomy/term/%/% (ie. biological classification) pages are handled - though more will probably be added after testing.

informatics-dev commented 11 years ago

Comment by Alice Heaton

So - this is (finally) ready for testing. It was delayed at the last minute as I switched the wget backend (which created an HTML snapshot by downloading all files associated with the page) to the phantomjs backend (which generates a PDF from the page), which I had to implement.

Anyway, a few things to know :

Let me know if there are any problems. Oh, it's on branch 3130-citations.

informatics-dev commented 11 years ago

Comment by Dimitrios Koureas

The generated preview and pdf file is always a snapsht of the home page and not the page requested. E.g. creating citation for: http://dev-3110.taxon.name/content/ath-ath-14191 resulted in: http://dev-3110.taxon.name/sites/dev-3110.taxon.name/files/cite/2013-08-08/scratchpads-front-1.pdf

informatics-dev commented 11 years ago

Comment by Alice Heaton

The issue has been fixed.

The functionality works as expected for all logged in users.

The 'Create Citations' button will always work for anonymous users. The 'Preview' button will work the first time it is used - but if users leave the page, then come back to it within a short time interval, it might not work again. This is due to caching issues, and reloading the page with CTRL+F5 will make it work again. There is no simple way to fix this unfortunately. The options are:

  1. Leave it as it is ;
  2. Do not show 'Preview' button for anonymous users ;
  3. Do not use AJAX for the preview button ;
  4. Reload the Cite Me block using AJAX to bypass cache restrictions.

Option (4) would be some work, and option (3) would make having a 'preview' button pointless. I would go for option (2) - the original purpose of the 'preview' button was to show logged in users how the page would look when cited (as the generated page is generated as an anonymous user would see it).

informatics-dev commented 11 years ago

Comment by Dimitrios Koureas

here are some notes from the first phase of testing (on http://dev-3110.taxon.name):

The Cite me functionality should not be available for Biblio nodes

Remove access to the Cite me block for maintainers

Generated title for taxon overview pages (i.e. http://dev-3110.taxon.name/checklist/thymus) should include authority (e.g. Thymus L.)

Authors names should be displayed instead of author username.

When multiple authors have edited a page (through existing revisions) they should be included in the authors list. The order should be: First author: The person who created the node followed by the rest in alphabetical order. In the Cite me block when multiple authors, the first three should me mentioned followed by et al. The full list o authors should be included in the pdf file.

When a taxonomic tree is included in the pdf, it should be expanded at the same level it was when the Cite me button was pressed. Currently is included as always collapsed

When a taxon overview page is cited, the title should also include the selected tab name. e.g. Thymus L.

Data coming from external sources (BHL, IUCN, Google Scholar, NCBI and EOL) should be omitted from the generated pdf.

In the generated .pdf the title of the page should be the title of the file

When a page with a slickgrid (e.g. specimens) is cited, all records in the table should be included. (e.g. http://dev-3110.taxon.name/taxonomy/term/98/specimens)

pdfs with google maps do not display aggregated points correctly (http://cite.scratchpads.eu/dev-3110-taxon-name/2013-7-1/taxonomy-term-98-maps.pdf)

A message in the archived page should always be included, remove the option from the maintainers "Include a message on the archived page"

When the Cite me button is clicked it would be useful if we could generate some standard bibliographic formats (i.a. BibTex, RIS, EndNote XML) with the reference data. I could provide an example on mapping node data with the bibliographic file fields

The preview button should be available only to registered users.

The included message in the generated pdf file should be formatted as follows:

This is a snapshot of the ["Title of site"] page located at [URL of page], generated on [Date] and permanently archived at [URL of pdf].

Please cite this page as: LastName F., LastName F. & LastName F. (2013) Title of Page. Title of Scratchpads. URL of pdf

that's all for now :)

informatics-dev commented 11 years ago

Comment by Alice Heaton

  1. Remove access to the Cite me block for maintainers

Why ?

  1. When multiple authors have edited a page (through existing revisions) they should be included in the authors list. The order should be: First author: The person who created the node followed by the rest in alphabetical order. In the Cite me block when multiple authors, the first three should me mentioned followed by et al. The full list o authors should be included in the pdf file.

What do you do when several pieces of content with different authors are displayed on the same page ? Who are the three most important ones ?

  1. Data coming from external sources (BHL, IUCN, Google Scholar, NCBI and EOL) should be omitted from the generated pdf.

Why so ? That is part of the page ? In Scratchpads 1 that was one of the features I think.

  1. When a page with a slickgrid (e.g. specimens) is cited, all records in the table should be included. (e.g. http://dev-3110.taxon.name/taxonomy/term/98/specimens)

Difficult, because Slickgrid only loads it's data when you scroll down.

  1. pdfs with google maps do not display aggregated points correctly (http://cite.scratchpads.eu/dev-3110-taxon-name/2013-7-1/taxonomy-term-98-maps.pdf)

That's a problem, there's no much I can do about the rendering.

informatics-dev commented 11 years ago

Comment by Alice Heaton

All the problems have been fixed and suggestions have been implemented, apart from:

Redmine issue 3258 (Citation: button should not be available for maintainers): I don't understand the point of this - can you explain further ? Redmine issue 3264 (Citations: remove external data): I don't understand the point of this - can you explain further ? Redmine issue 3265 (Citations: when citing slickgrids, include all records): I don't think this is a good idea - please see the issue for discussion. Bug Redmine issue 3266 (Citations: google maps aggregated points problem): This is difficult - see the issue for more information

informatics-dev commented 11 years ago

Comment by Alice Heaton

Redmine issue 3266 has been fixed - the other issues are in discussion, and are not blockers - so this can go to testing again.

informatics-dev commented 11 years ago

Comment by Alice Heaton

This is ready for testing again. Note there are three issues still under discussion (#3258, Redmine issue 3264, Redmine issue 3265), but these are about features/functionality, not bugs.

The platform for testing is 3130-citations. There is a site already set-up if you wish to use it: http://dev-3130.thymus.taxon.name

informatics-dev commented 11 years ago

Comment by Alice Heaton

Note for testers:

informatics-dev commented 11 years ago

Comment by Laurence Livermore

Add a space after labels "Title:Thymus skopjensis (Checklist)" should be "Title: Thymus skopjensis (Checklist)"

Is there a reason to allow maintainers be edit this block?

Tested for:

Should character projects be citable?

Intensive serial testing (making multiple citations very quickly) mostly worked (only one "server unavailable" error).

References still need testing

informatics-dev commented 11 years ago

Comment by Laurence Livermore

Three bits of feedback:

informatics-dev commented 10 years ago

Comment by Alice Heaton

Laurence Livermore wrote:

  • Add a space after labels after clicking the "cite me" button "Title:Thymus skopjensis (Checklist)" should be "Title: Thymus skopjensis (Checklist)"

Ok.

  • Is there a reason to allow maintainers be edit this block?

Well maintainers can edit all blocks, they have the "Administer Block" permission.

  • Should character projects be citable?

We can make them citable, but we'd only generate a snapshot of the page - no interaction.

informatics-dev commented 10 years ago

Comment by Alice Heaton

informatics-dev commented 10 years ago

Comment by Dimitrios Koureas

I can't recall the functionality of the following option: Service to create citation

Is there something I forget?

informatics-dev commented 10 years ago

Comment by Alice Heaton

Dimitrios Koureas wrote:

I can't recall the functionality of the following option: Service to create citation

I'm not sure what you are referring to - I can't find this text on this page. Is that linked to a different issue ?

informatics-dev commented 10 years ago

Comment by Dimitrios Koureas

Please see attached

informatics-dev commented 10 years ago

Comment by Alice Heaton

Oh I see - this is only available if the maintainers have enabled more than one citation service (for instance you can create the citation on webcite.org, archive.is or on your own site).

This will not be enabled by default on other sites - it is only enabled on this site because it is a test site you created before I developed the cite.scratchpads.eu service.

I have now disabled the additional service on your test site.

informatics-dev commented 10 years ago

Comment by Dimitrios Koureas

The 'cite me' button is not available any more at dev-3130.thymus.taxon.name for anonymous or logged-in users. The issue remains after disabling and re-enabling the tool.

informatics-dev commented 10 years ago

Comment by Alice Heaton

It's there where I try. Remember it's not present on all pages - only:

If you still can't see the tab, can you give me an example page URL and let me know which browser you're using ? Thanks.

informatics-dev commented 10 years ago

Comment by Dimitrios Koureas

The date shown when the cite-me button is clicked is not in agreement with the date of the node. For example for the page http://dev-3130.thymus.taxon.name/content/b-hw-100320345?citethispage=9f761

The date shown on top is Wed, 2012-12-19 12:21 while the date on the bottom of the page is Tue, 2012-12-18 17:11

If this is the last edit date, then this should be clearly marked as such.

informatics-dev commented 10 years ago

Comment by Alice Heaton

Dimitrios Koureas wrote:

The date shown when the cite-me button is clicked is not in agreement with the date of the node. For example for the page http://dev-3130.thymus.taxon.name/content/b-hw-100320345?citethispage=9f761

The date shown on top is Wed, 2012-12-19 12:21 while the date on the bottom of the page is Tue, 2012-12-18 17:11

If this is the last edit date, then this should be clearly marked as such.

Well the date it shows in the "cite me" block is guaranteed to be more recent than the last modification. So on a node page, where we know the last modified date, this is indeed the last modified date.

On a taxonomic term page however we do not know the last modified date (as the page is composed of multiple content), so the date displayed in the block is the current date (which by definition must be more recent than the last modification).

I'm not sure how to mark the date as being "more recent than the last modification" - it's a bit long text to add in the block. Should this be done on hover maybe ?

informatics-dev commented 10 years ago

Comment by Dimitrios Koureas

Discussed with Alice and decided to make the following changes:

Remove the date from the citation slide-down tab

Replace the "." after the node title in the reference citation example with "in"

Add at the end of the reference citation example "last accessed on 'Today's date'"

informatics-dev commented 10 years ago

Comment by Alice Heaton

Also we noticed there was an issue on cite.scratchpads.eu, the citation there refers to the original page when it should refer to the PDF.

informatics-dev commented 10 years ago

Comment by Alice Heaton

Issues have been fixed. I used RFC 2822 for date formatting (eg. "Tue, 08 Oct 2013 15:02:43 +0000") which should be unambiguous (I realised that the module previously didn't include the time zone)

informatics-dev commented 10 years ago

Comment by Dimitrios Koureas

Tested and works as expected

Note to support: This will have to be documented once released

informatics-dev commented 10 years ago

Comment by Simon Rycroft

Branch merged with master ready for today's release.