Closed informatics-dev closed 10 years ago
Comment by Alice Heaton
There are numerous paid services for this (as some countries' legal system requires particular organisations to archive their content) and a few free ones.
Of the free ones, the only one that specialises in scholarly citations is WebCite (http://webcitation.org). Its future is uncertain at the moment (they are currently raising funds to see if they can expand to accept new citations) - however of the free ones I have seen, it is the only one I would trust as reliable in the long term. The reason for this is that it has a particular purpose (ie. scholarly citations) and thus understands the need to ensure pages are kept for ever. It is also the only of the free ones that deals with issues such as copyright.
Webcite may or may not continue to accept new submissions after 2013 - but then it's hard to tell for any of the free ones how long they will last (eg. bo.lt and backup url I have seen mentioned and have already disappeared).
Some background: https://en.wikipedia.org/wiki/Web_archiving
Comment by Alice Heaton
I would expect people who would use this type of functionality a lot already have it set up - many of the existing services provide 'bookmarklet' style functionality, to allow people to easily archive a given page.
We would be merely providing an in-build version of that bookmarklet, which would be useful for people who do not normally use such archiving services. The real added bonus is to generate the list of authors involved on a particular page, as the Scratchpads 1 version does.
We could also decide to provide our own service, specifically for Scratchpads. A simple version of this could use wkhtmltopdf to generate a PDF version of the page as it is at a given time.
I would think for the now the best plan is to:
If Webcite closes down, or we think it better to provide this service ourselves, we can change the backend with having to change everything else. The feature could even allow users to choose which backend they prefer, though I'd expect people who have a preference are people who already use an archiving service and would use that one anyway.
Comment by Alice Heaton
So I'm afraid webcite is not very reliable at the moment. I tried yesterday afternoon to archive a page, and all I got was a MySql too many connections error. I've tried this morning again, and I'm getting the same error.
I do not have confidence in any of the other free ones, because they do not have a business model and they do not have a long term vision. If we are going to rely on a third party for long term archival, I would expect both. So most likely we are going to have to provide our own service.
Comment by Alice Heaton
So leaving aside the archiving side of things, we still need to think about:
For 2. I see that both Scratchpads 1 and antweb.org decided to only show the cite me button/block on selected pages. This definitely makes number 3 easier. The good approach then is probably to provides hooks so that any module can declare that they can parse the authors for a given path (it makes sense to have this work with paths rather than content types - because 1. we are citing a path, and 2. the same content will or will not pull other content depending on where it is displayed). This way we can work on selected pages at first, and then add more types of pages if needed as we go along.
This means that we need to answer '3' in specific contexts only:
For a given path we need the module to be able to say if it's a default handler or a specific handler. Default handlers would only get invoked if none of the specific handlers could parse the authors.
Comment by Alice Heaton
So the final version has backends for webcite, archive.is, wget (local) and phantomjs (local).
By the default, phantomjs is the only enabled backend - it generates PDF files and PNG previews. At the moment only node/% and taxonomy/term/%/% (ie. biological classification) pages are handled - though more will probably be added after testing.
Comment by Alice Heaton
So - this is (finally) ready for testing. It was delayed at the last minute as I switched the wget backend (which created an HTML snapshot by downloading all files associated with the page) to the phantomjs backend (which generates a PDF from the page), which I had to implement.
Anyway, a few things to know :
You can enable it on the tools page, under the 'Share' section ;
The 'Cite me' button is at the top of the page, as a slide down (next to Login / Language) ;
The block that comes down shows information about the page (it's title, it's authors and it's last modified date) ;
The author/title/date is only shown in the block (it's not used for the snapshot) to help users know who to cite. This information is gathered in a number of different ways, depending on the page:
~ On node pages (biblio, specimen/observation, taxon description, etc.) it's simply the information from the node (author, last modified date, title) ;
~ On biological vocabulary pages (taxonomy/term/%/%) the date is the current date (it's hard to tell what the last modified date would be), the title is the title of the term, the authors are gathered from the list of all content (nodes and files) that is being displayed on the current tab.
While the author/date/title information is only there for information purpose, at the moment the Cite Me button only shows on pages for which we have such information (so node pages and taxonomy term pages). We could enable it for all pages (and not show any information), or for more pages but not all of them (in which case we will need to see, for each type of page, how we can gather information about authors/date/title ; this has to be done on a case by case basis). Let me know what you think about this.
Let me know if there are any problems. Oh, it's on branch 3130-citations.
Comment by Dimitrios Koureas
The generated preview and pdf file is always a snapsht of the home page and not the page requested. E.g. creating citation for: http://dev-3110.taxon.name/content/ath-ath-14191 resulted in: http://dev-3110.taxon.name/sites/dev-3110.taxon.name/files/cite/2013-08-08/scratchpads-front-1.pdf
Comment by Alice Heaton
The issue has been fixed.
The functionality works as expected for all logged in users.
The 'Create Citations' button will always work for anonymous users. The 'Preview' button will work the first time it is used - but if users leave the page, then come back to it within a short time interval, it might not work again. This is due to caching issues, and reloading the page with CTRL+F5 will make it work again. There is no simple way to fix this unfortunately. The options are:
Option (4) would be some work, and option (3) would make having a 'preview' button pointless. I would go for option (2) - the original purpose of the 'preview' button was to show logged in users how the page would look when cited (as the generated page is generated as an anonymous user would see it).
Comment by Dimitrios Koureas
here are some notes from the first phase of testing (on http://dev-3110.taxon.name):
This is a snapshot of the ["Title of site"] page located at [URL of page], generated on [Date] and permanently archived at [URL of pdf].
Please cite this page as: LastName F., LastName F. & LastName F. (2013) Title of Page. Title of Scratchpads. URL of pdf
that's all for now :)
Comment by Alice Heaton
- Remove access to the Cite me block for maintainers
Why ?
- When multiple authors have edited a page (through existing revisions) they should be included in the authors list. The order should be: First author: The person who created the node followed by the rest in alphabetical order. In the Cite me block when multiple authors, the first three should me mentioned followed by et al. The full list o authors should be included in the pdf file.
What do you do when several pieces of content with different authors are displayed on the same page ? Who are the three most important ones ?
- Data coming from external sources (BHL, IUCN, Google Scholar, NCBI and EOL) should be omitted from the generated pdf.
Why so ? That is part of the page ? In Scratchpads 1 that was one of the features I think.
- When a page with a slickgrid (e.g. specimens) is cited, all records in the table should be included. (e.g. http://dev-3110.taxon.name/taxonomy/term/98/specimens)
Difficult, because Slickgrid only loads it's data when you scroll down.
- pdfs with google maps do not display aggregated points correctly (http://cite.scratchpads.eu/dev-3110-taxon-name/2013-7-1/taxonomy-term-98-maps.pdf)
That's a problem, there's no much I can do about the rendering.
Comment by Alice Heaton
All the problems have been fixed and suggestions have been implemented, apart from:
Redmine issue 3258 (Citation: button should not be available for maintainers): I don't understand the point of this - can you explain further ? Redmine issue 3264 (Citations: remove external data): I don't understand the point of this - can you explain further ? Redmine issue 3265 (Citations: when citing slickgrids, include all records): I don't think this is a good idea - please see the issue for discussion. Bug Redmine issue 3266 (Citations: google maps aggregated points problem): This is difficult - see the issue for more information
Comment by Alice Heaton
Redmine issue 3266 has been fixed - the other issues are in discussion, and are not blockers - so this can go to testing again.
Comment by Alice Heaton
This is ready for testing again. Note there are three issues still under discussion (#3258, Redmine issue 3264, Redmine issue 3265), but these are about features/functionality, not bugs.
The platform for testing is 3130-citations. There is a site already set-up if you wish to use it: http://dev-3130.thymus.taxon.name
Comment by Alice Heaton
Note for testers:
Comment by Laurence Livermore
Add a space after labels "Title:Thymus skopjensis (Checklist)" should be "Title: Thymus skopjensis (Checklist)"
Is there a reason to allow maintainers be edit this block?
Tested for:
Should character projects be citable?
Intensive serial testing (making multiple citations very quickly) mostly worked (only one "server unavailable" error).
References still need testing
Comment by Laurence Livermore
Three bits of feedback:
Comment by Alice Heaton
Laurence Livermore wrote:
- Add a space after labels after clicking the "cite me" button "Title:Thymus skopjensis (Checklist)" should be "Title: Thymus skopjensis (Checklist)"
Ok.
- Is there a reason to allow maintainers be edit this block?
Well maintainers can edit all blocks, they have the "Administer Block" permission.
- Should character projects be citable?
We can make them citable, but we'd only generate a snapshot of the page - no interaction.
Comment by Alice Heaton
Comment by Dimitrios Koureas
I can't recall the functionality of the following option: Service to create citation
Is there something I forget?
Comment by Alice Heaton
Dimitrios Koureas wrote:
I can't recall the functionality of the following option: Service to create citation
I'm not sure what you are referring to - I can't find this text on this page. Is that linked to a different issue ?
Comment by Dimitrios Koureas
Please see attached
Comment by Alice Heaton
Oh I see - this is only available if the maintainers have enabled more than one citation service (for instance you can create the citation on webcite.org, archive.is or on your own site).
This will not be enabled by default on other sites - it is only enabled on this site because it is a test site you created before I developed the cite.scratchpads.eu service.
I have now disabled the additional service on your test site.
Comment by Dimitrios Koureas
The 'cite me' button is not available any more at dev-3130.thymus.taxon.name for anonymous or logged-in users. The issue remains after disabling and re-enabling the tool.
Comment by Alice Heaton
It's there where I try. Remember it's not present on all pages - only:
If you still can't see the tab, can you give me an example page URL and let me know which browser you're using ? Thanks.
Comment by Dimitrios Koureas
The date shown when the cite-me button is clicked is not in agreement with the date of the node. For example for the page http://dev-3130.thymus.taxon.name/content/b-hw-100320345?citethispage=9f761
The date shown on top is Wed, 2012-12-19 12:21 while the date on the bottom of the page is Tue, 2012-12-18 17:11
If this is the last edit date, then this should be clearly marked as such.
Comment by Alice Heaton
Dimitrios Koureas wrote:
The date shown when the cite-me button is clicked is not in agreement with the date of the node. For example for the page http://dev-3130.thymus.taxon.name/content/b-hw-100320345?citethispage=9f761
The date shown on top is Wed, 2012-12-19 12:21 while the date on the bottom of the page is Tue, 2012-12-18 17:11
If this is the last edit date, then this should be clearly marked as such.
Well the date it shows in the "cite me" block is guaranteed to be more recent than the last modification. So on a node page, where we know the last modified date, this is indeed the last modified date.
On a taxonomic term page however we do not know the last modified date (as the page is composed of multiple content), so the date displayed in the block is the current date (which by definition must be more recent than the last modification).
I'm not sure how to mark the date as being "more recent than the last modification" - it's a bit long text to add in the block. Should this be done on hover maybe ?
Comment by Dimitrios Koureas
Discussed with Alice and decided to make the following changes:
Comment by Alice Heaton
Also we noticed there was an issue on cite.scratchpads.eu, the citation there refers to the original page when it should refer to the PDF.
Comment by Alice Heaton
Issues have been fixed. I used RFC 2822 for date formatting (eg. "Tue, 08 Oct 2013 15:02:43 +0000") which should be unambiguous (I realised that the module previously didn't include the time zone)
Comment by Dimitrios Koureas
Tested and works as expected
Note to support: This will have to be documented once released
Comment by Simon Rycroft
Branch merged with master ready for today's release.
Description:
The aim is to provide a way for users to cite a given page, such that:
The Scratchpads 1 implementation provides a block on the page which provides:
The cited, static version of the page is stored on the site itself.
Attachments: