gbhl / bhl-europe

Biodiversity Heritage Library Europe
http://www.bhl-europe.eu/
15 stars 2 forks source link

3.2.1 - Download single pages/ plates/ figures of document (Content Download) #99

Closed janahoffmann closed 12 years ago

janahoffmann commented 13 years ago

Click on single page(/figure/plate) in table of contents (TOC) or enter page numbers in box. Download selected pages

Precondition: correct pagination

janahoffmann commented 13 years ago

previous comments: issue #19 Access – Web service for generating different formats for derivatives generated in realtime for different page ranges: PDF Download selected pages in PDF format, enter page number in field (1-25) The user interface is #16

leenamba commented To test you can go to this URL and input different ranges:

http://bhl-alexandria.nhm.ac.uk/fedora/objects/demo%3Adarwin/methods

use the demo:book2pdf field enter the range and hit run.

example: 1-10,20-22,53-46

Know bugs now are whitespaces and single numbers.

You can also directly put ranges in the URL of the method

example: http://bhl-alexandria.nhm.ac.uk/fedora/objects/demo%3Adarwin/methods/demo%3Abook2pdf/pdf?ranges=40-42

ZhengLIAtos commented Bugs have been fixed Entering whitespaces is considered as the whole range of a book Single numbers such as 1,4,6 can be used as the parameter, even mixed with other ranges. For each range, if the lower bound is greater than the upper bound, it will be rearranged in a ascendant order. (e.g. 10-1=> 1-10) If one of them is out of range, it will be rewritten to be within the range (e.g. 1-10000 => 1-114)

Here are some example that everybody could try: 1-10, 3, 40-42 10-1, 50, 60 (several whitespaces) 100-1000000

chrisfreeland commented Users are going to need to be able to request by the page number printed on the page, not the series number of the page as scanned from the front of the book to the back of the book. They need "Pages 1-3" not images 1-3. Plus they will want to get pages from all parts of the book (a title page, the pages of an article, the plate and map at the back of the book), which is why we have a GUI for the same process, as at http://www.biodiversitylibrary.org/pdfgen/69526

leenamba commented Yes we will need a UI which is in a separate issue #16 @janahoffmann @JiriFrank We're going to need some specifications here based on Chris' comments. Please advise.

janahoffmann commented 13 years ago

previous comments: issue #18 Access – Web service for generating different formats for derivatives generated in realtime for different page ranges: Image(jp2/jpg)

ZhengLIAtos is assigned

Download selected pages in Image(jp2 or jpg) format, enter page number in field (1-25)

janahoffmann commented 13 years ago

previous comments: #17 Access – Web service for generating different formats for derivatives generated in realtime for different page ranges: OCR

ZhengLIAtos is assigned

Download selected pages in OCR format, enter page number in field (1-25)

janahoffmann commented 13 years ago

previous comment: issue #16 Portal – Interface for downloading realtime derivatives: PDF

hengdi is assigned

Download one page in PDF format Download a book in PDF format

janahoffmann commented 13 years ago

@leenamba It is exactly as Chris described. In case we do not have the proper pagination in the metadata we can do it the same way as BHLUS is doing it.

This is an important note: Are you generating a PDF containing the text of a single journal article or book chapter? If so, please provide title and author information. BHL stores generated PDFs and author and title data will allow these PDFs to be indexed, searched and retrieved by other users. If you download an article but do not provide title or author information, these articles will be lost.

I think we will be able to improve this process....Comments and suggestions welcome.

zhengl commented 13 years ago

Right now, we have provided the foundation for downloading PDFs by series number. If actual page numbers are required as parameters, we should wait and see what kind of metadata in AIP is stored during ingest process.

In addition, PDFs do not contain text, but images (you may try the example). If we want to store and index generated PDFs for the sake of efficiency, it is another new story. As I can see it, we don't have to do so because all PDFs are generated in real-time and all information for a book or journal is stored in metadata, no need for the users to input.

audreyhzhang commented 13 years ago

@janahoffmann we generate PDF from images (format jp2 or tiff), cause these are stored in our fedora repository as the orginal datastream that should be preserved. For the moment, the AIP we use for ingestion contains 3 parts:

And for the title and author information, we can provide them from the fedora repository metadata, but then it's not clear what job we should do and what job the Search Part will do. for my understanding, the work flow is like this: user will search a book from the Search feature --> the Search Engine will give all information about this book (title, author, description, etc..), and also give a Link for downloading different formats of this book (pdf, image, ocr).
SO the task for this issue should be mainly to provide this Link for downloading?!

Please advise.

janahoffmann commented 13 years ago

@ZhengLIAtos Well, unfortunately a book (volume) contained several article and sometimes these artciles are not! anotated in the metadata, because there is a great difference in cataloguing by librarians and the actual needs of scientists. The librarian regards the book as an item and might not list all the included artciles in a volume... the scientists wants only the article in the respective book. By search he would only find the book(volume) but not the article. The user will find it by checking the entire volume and page numbers (usually the user has the page information already, e.g. volume number, pages). The the user wants to select a page range for download (the actual range of the articel which may or may not be indexed in the metadata). This is why we need the option to let the users decide what pages to be downloaded. BHLUS uses the information provided by one user (author, real page range and titel (of the article)) to save/ index the request. Next time a user will be able to find the article already requested for download and don't have to fill in the template. I think this is a clever way to do it. However, if correct pagination is provided and the metadata goes down to article level in a volume then we don't have this issue of course, because users will find the artcile directly by search. So, I agree this is another story :-) and you might have to wait and see what comes with the metadata or AIP

@hengdi This issue refers to the content viewer not the search. If there is a table of content provided for a volume (in the content viewer, see BHLUS portal) the user can select a particular page, plate or illustration and download single pages or range of pages (explanation see above). This is not related to search but to download from the content viewer. The direct link for download from the search results is another issue # 61

JiriFrank commented 13 years ago

See comment #83.

janahoffmann commented 13 years ago

not implemented - issue needs to reopened

janahoffmann commented 13 years ago

please follow up on this issue

leenamba commented 12 years ago

The page ranges will be based on the book viewers pages and not the books page numbers since we will not always have the page number metadata.

leenamba commented 12 years ago

@janahoffmann

leenamba commented 12 years ago

Can select single or several pages for download

JiriFrank commented 12 years ago

3.2.1 - Download single pages/ plates/ figures of document (Content Download)

COR number: 3.2.1 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Content viewer

Description:

Click on single page(/figure/plate) in table of contents (TOC) or enter page numbers in box. Download selected pages

You can reach the content viewer by click on Read book in result list.

@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH

RalfH commented 12 years ago

It works. Nice functionality. But quality of pdfs (even if high quality is chosen) is very variable. Flora Malesiana for example results in surprisingly low quality pdf. On the other hand, coloured illustrations from older floras are quite good in pdf.

-----Ursprüngliche Nachricht----- Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Dienstag, 17. Januar 2012 15:22 An: Hand, Ralf Betreff: Re: [bhle] 3.2.1 - Download single pages/ plates/ figures of document (Content Download) (#99)

3.2.1 - Download single pages/ plates/ figures of document (Content Download)

COR number: 3.2.1 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Content viewer

Description:

Click on single page(/figure/plate) in table of contents (TOC) or enter page numbers in box. Download selected pages

You can reach the content viewer by click on Read book in result list.

@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/99#issuecomment-3528434

JiriFrank commented 12 years ago

3.2.1 - Download single pages/ plates/ figures of document (Content Download)

COR number: 3.2.1 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Content viewer

Description:

Click on single page(/figure/plate) in table of contents (TOC) or enter page numbers in box. Download selected pages

You can reach the content viewer by click on Read book in result list.

@AntonioGVH @fwelter @LarissaS

LarissaS commented 12 years ago

Did not understand how it works! Or it does not work?

-----Original Message----- From: JiriFrank [mailto:reply@reply.github.com] Sent: 19 января 2012 г. 11:56 To: Smirnova Larissa Subject: Re: [bhle] 3.2.1 - Download single pages/ plates/ figures of document (Content Download) (#99)

3.2.1 - Download single pages/ plates/ figures of document (Content Download)

COR number: 3.2.1 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Content viewer

Description:

Click on single page(/figure/plate) in table of contents (TOC) or enter page numbers in box. Download selected pages

You can reach the content viewer by click on Read book in result list.

@AntonioGVH @fwelter @LarissaS


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/99#issuecomment-3561021


19/1/2012 - Filtered through antispam by ICT

LarissaS commented 12 years ago

Ok, I found it! So it is with icon on the page itself, you have to add page to the download basket first! May be necessary to write a tutorial for that! But it works!

-----Original Message----- From: JiriFrank [mailto:reply@reply.github.com] Sent: 19 января 2012 г. 11:56 To: Smirnova Larissa Subject: Re: [bhle] 3.2.1 - Download single pages/ plates/ figures of document (Content Download) (#99)

3.2.1 - Download single pages/ plates/ figures of document (Content Download)

COR number: 3.2.1 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Content viewer

Description:

Click on single page(/figure/plate) in table of contents (TOC) or enter page numbers in box. Download selected pages

You can reach the content viewer by click on Read book in result list.

@AntonioGVH @fwelter @LarissaS


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/99#issuecomment-3561021


19/1/2012 - Filtered through antispam by ICT

HenningScholz commented 12 years ago

Works fine.


Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Di 17.01.2012 15:22 An: Scholz, Henning Betreff: Re: [bhle] 3.2.1 - Download single pages/ plates/ figures of document (Content Download) (#99)

3.2.1 - Download single pages/ plates/ figures of document (Content Download)

COR number: 3.2.1 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Content viewer

Description:

Click on single page(/figure/plate) in table of contents (TOC) or enter page numbers in box. Download selected pages

You can reach the content viewer by click on Read book in result list.

@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/99#issuecomment-3528434

grahamhrbge1670 commented 12 years ago

Can download pages, save downloaded files, but for some reason cannot open the zipped files, getting a message when I try to extract from zipped files that the files are blocked.

AntonioGVH commented 12 years ago

I tried this before.

It works perfect!

Quoting JiriFrank:

3.2.1 - Download single pages/ plates/ figures of document (Content Download)

COR number: 3.2.1 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Content viewer

Description:

Click on single page(/figure/plate) in table of contents (TOC) or enter page numbers in box. Download selected pages

You can reach the content viewer by click on Read book in result list.

@AntonioGVH @fwelter @LarissaS


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/99#issuecomment-3561021

JiriFrank commented 12 years ago

It works properly.

Help will be part of tutorial for content viewer and will be finalized after content viewer integration on the portal. @LarissaS