Open xing93111 opened 6 years ago
Thanks for submitting the issue, @xing93111.
Further detail: If MIK is run instead with the class CdmCompound, compound objects are generated with the directory structure of a Book, except each page is a PDF (instead of a TIFF). These PDFs are OK (not corrupt).
As far as we understand, the CdmPdfDocuments class is supposed to merge these page-level PDFs into a single aggregated PDF. The result is a corrupted PDF.
Is there anything wrong with the configuration? Or is there a flaw in the toolchain?
I can't see anything wrong with the configuration. This particular toolchain relies on CONTENTdm's internal functionality to merge the PDF pages into a single document. It used to work fine - for example the PDFs in https://ecuad.arcabc.ca/islandora/object/ecuad%3Acals were generated using it, with this .ini file: https://github.com/MarcusBarnes/mik/blob/master/extras/samples/calendars_config.ini That said, the filegetter was has probably not been tested since the major code cleanup that happened after SFU used the toolchain.
The code that fetches the assembled PDF content is here. I suggest dumping the value of the URL generated here and then running it using curl
to see whether the PDF if produces is corrupted.
The configuration file here uses CdmPhpDocuments
, but I don't see such class is included in mik toolkit source code. Where can I find the file?
@xing93111, sorry, that config file was an early one and predates #223. The configuration should use CdmPdfDocuments
in lines 22 and 29.
... and I've just updated https://github.com/MarcusBarnes/mik/wiki/Toolchain:-CONTENTdm-compound-PDFs. Very sorry about that.
I used a text editor to open the generated PDF file and found it is not a PDF at all but an XML file. For example, the following is the content of the generated PDF file related to this object: http://digicon.athabascau.ca/cdm/ref/collection/auarchives/id/499
<?xml version="1.0"?>
<cpd>
<type>Document</type>
<page>
<pagetitle>Page 1</pagetitle>
<pagefile>485.pdf</pagefile>
<pageptr>484</pageptr>
</page>
<page>
<pagetitle>Page 2</pagetitle>
<pagefile>486.pdf</pagefile>
<pageptr>485</pageptr>
</page>
<page>
<pagetitle>Page 3</pagetitle>
<pagefile>487.pdf</pagefile>
<pageptr>486</pageptr>
</page>
<page>
<pagetitle>Page 4</pagetitle>
<pagefile>488.pdf</pagefile>
<pageptr>487</pageptr>
</page>
<page>
<pagetitle>Page 5</pagetitle>
<pagefile>489.pdf</pagefile>
<pageptr>488</pageptr>
</page>
<page>
<pagetitle>Page 6</pagetitle>
<pagefile>490.pdf</pagefile>
<pageptr>489</pageptr>
</page>
<page>
<pagetitle>Page 7</pagetitle>
<pagefile>491.pdf</pagefile>
<pageptr>490</pageptr>
</page>
<page>
<pagetitle>Page 8</pagetitle>
<pagefile>492.pdf</pagefile>
<pageptr>491</pageptr>
</page>
<page>
<pagetitle>Page 9</pagetitle>
<pagefile>493.pdf</pagefile>
<pageptr>492</pageptr>
</page>
<page>
<pagetitle>Page 10</pagetitle>
<pagefile>494.pdf</pagefile>
<pageptr>493</pageptr>
</page>
<page>
<pagetitle>Page 11</pagetitle>
<pagefile>495.pdf</pagefile>
<pageptr>494</pageptr>
</page>
<page>
<pagetitle>Page 12</pagetitle>
<pagefile>496.pdf</pagefile>
<pageptr>495</pageptr>
</page>
<page>
<pagetitle>Page 13</pagetitle>
<pagefile>497.pdf</pagefile>
<pageptr>496</pageptr>
</page>
<page>
<pagetitle>Page 14</pagetitle>
<pagefile>498.pdf</pagefile>
<pageptr>497</pageptr>
</page>
<page>
<pagetitle>Page 15</pagetitle>
<pagefile>499.pdf</pagefile>
<pageptr>498</pageptr>
</page>
</cpd>
We need to establish that CONTENTdm still supports the ability to join PDF pages into a single multipage PDF file (it may have changed since this code was written). To do that we need to create a request URL using the code below (from here):
$get_file_url = $this->utilsUrl .'getdownloaditem/collection/'
. $this->alias . '/id/' . $pointer . '/type/compoundobject/show/1/cpdtype/document-pdf/filename/'
. $document_structure['page'][0]['pagefile'] . '/width/0/height/0/mapsto/pdf/filesize/0/title/'
. urlencode($document_structure['page'][0]['pagetitle']);
and see if we get a PDF from the server. So that would look like:
http://yourcdmutilsurl/getdownloaditem/collection/auarchives/id/499/type/compoundobject/show/1/cpdtype/document-pdf/filename/485.pdf/width/0/height/0/mapsto/pdf/filesize/0/title/Page%201
If you use curl
to get that URL, what does the resulting file look like?
If you don't mind sharing your CONTENTdm API URL with me I can take a look.
@bondjimbond has the URL but it requires a VPN connection. URL: http://deck.cs.athabascau.ca/dmwebservices/index.php?q=
@mjordan Here is the output:
billg@lib10:~$ curl http://digicon.athabascau.ca/getdownloaditem/collection/auarchives/id/499/type/compoundobject/show/1/cpdtype/document-pdf/filename/485.pdf/width/0/height/0/mapsto/pdf/filesize/0/title/Page%201
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" class="no-js">
<!-- CONTENTdm Version 6.8.0.412s/6.8.0.761w (c) OCLC 2011-2018. All Rights Reserved. //-->
<head>
<meta name="robots" content="noindex,nofollow,noarchive" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<link rel="shortcut icon" type="image/x-icon" href="/ui/custom/default/collection/default/images/favicon.ico?version=1404943627" />
<title>CONTENTdm Title</title>
<script type="text/javascript">
var cdmHttps = 'off';
var cdmInsecureWebsitePort = '';
var cdmSecureWebsitePort = '';
</script>
<link rel="stylesheet" type="text/css" href="/ui/custom/default/collection/default/css/main.css?version=1529334550" />
<link type="text/css" href="/utils/getstaticcontent/file/js~bt~jquery.bt.css/type/stylesheet" rel="stylesheet" />
<link type="text/css" href="/utils/getstaticcontent/file/js~skins~tango~skin.css/type/stylesheet" rel="stylesheet" />
<link type="text/css" href="/utils/getstaticcontent/file/js~skins~cdm~skin.css/version/1401946701/type/stylesheet" rel="stylesheet" />
<style>
.line_breaker, pre {
white-space: pre;
white-space: pre-wrap;
white-space: pre-line;
white-space: -pre-wrap;
white-space: -o-pre-wrap;
white-space: -moz-pre-wrap;
white-space: -hp-pre-wrap;
word-wrap: break-word;
}
</style>
<!-- NEW JQUERY and UI -->
<script type="text/javascript" src="/utils/getstaticcontent/file/js~jquery_1.7.2~jquery-1.7.2.js/type/javascript"></script>
<script type="text/javascript" src="/utils/getstaticcontent/file/js~jquery_1.7.2~jquery-ui-1.8.20.js/type/javascript"></script>
<script type="text/javascript" src="/utils/getstaticcontent/file/js~jquery-ui-togglebox.js/type/javascript"></script>
<script type="text/javascript" src="/utils/getstaticcontent/file/js~jquery.hoverIntent.minified.js/type/javascript"></script>
<script type="text/javascript" src="/utils/getstaticcontent/file/js~jquery.scrollTo-min.js/type/javascript"></script>
<script type="text/javascript" src="/utils/getstaticcontent/file/js~default.js/version/1401946702/type/javascript"></script>
<script type="text/javascript" src="/utils/getstaticcontent/file/js~modernizr-latest.js/type/javascript"></script>
<!--[if lt IE 10]>
<script type="text/javascript" src="/utils/getstaticcontent/file/js~cdmOldInternetExplorerChecker.js/type/javascript"></script>
<![endif]-->
<script type="text/javascript" src="/utils/getstaticcontent/file/js~bt~jquery.bt.min.js/type/javascript"></script>
<script type="text/javascript" src="/utils/getstaticcontent/file/js~quickview.js/type/javascript"></script>
<!--[if IE]>
<script type="text/javascript" src="/ui/cdm/default/collection/default/js/excanvas.compiled.js"></script>
<![endif]-->
<!--[if IE 7]>
<link href="/ui/cdm/default/collection/default/css/ie7.css" type="text/css" rel="stylesheet" />
<![endif]-->
<script type="text/javascript">
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-6471153-5');
ga('send', 'pageview');
</script>
<script type="text/javascript" src="/ui/cdm/default/collection/default/js/cdm_ga.js"></script>
</head>
<body>
<a name="top"></a>
<!-- HEADER -->
<div id="headerWrapper" tabindex="1000">
<p><img src="/ui/custom/default/collection/default/images/digiport_banner6.jpg" alt="" /></p>
<span class="clear"></span>
</div>
<!-- NAV_TOP -->
<div id="nav_top">
<div id="nav_top_left">
<ul class="nav">
<li class="nav_li">
<a tabindex="1001" id="nav_top_left_first_link" href="http://digiport.athabascau.ca" >
<div class="nav_top_left_text_container">Home</div>
</a>
</li>
<li class="nav_li">
<a tabindex="1002" href="/cdm/" >
<div class="nav_top_left_text_container">Browse</div>
</a>
</li>
<li class="nav_li">
<a tabindex="1003" href="http://digicon.athabascau.ca/cdm4/help.php" >
<div class="nav_top_left_text_container">Help</div>
</a>
</li>
<li class="nav_li">
<a tabindex="1004" href="http://digiport.athabascau.ca/copyright.html" >
<div class="nav_top_left_text_container">Copyright</div>
</a>
</li>
<li class="nav_li">
<a tabindex="1005" href="http://library.athabascau.ca" >
<div class="nav_top_left_text_container">Athabasca University Library</div>
</a>
</li>
<li class="nav_li">
<a tabindex="1006" href="http://digiport.athabascau.ca/" >
<div class="nav_top_left_text_container">Digitization Portal</div>
</a>
</li>
</ul>
</div>
<div id="nav_top_right">
<ul class="nav">
<li class="nav_li_right_1">
<span class=""><!--<a href="javascript:session_check(fx);" id="debug_session_check">Session Check</a> - <a href="javascript:session_auth();" id="debug_session_auth">Session Auth</a> - <a href="javascript:session_deauth();" id="debug_session_de-auth">Session De-Auth</a> - -->
<span class="currentUser" id="currentUser"></span><a tabindex="1007" id="login_link" href="http://digicon.athabascau.ca/login/" data-analytics='{"category":"navigation","action":"click","label":"Log in link"}'>Log in</a>
</span>
</li>
<li class="nav_li_right_1 nav_top_right_divider">|</li>
<li class="nav_li_right_1">
<span class="icon_10 icon_nav_top_right ui-icon-help cdmHelpLink"></span><a tabindex="1008" class="cdmHelpLink" href="javascript:;" data-analytics='{"category":"navigation","action":"click","label":"Help link"}'><b>Help</b></a>
</li>
<li class="nav_li_right_1 nav_top_right_divider">|</li>
<li class="nav_li_right_1">
<div id="nav_top_right_language_dd_link">
<a tabindex="1009" href="javascript:;" id="nav_top_right_language_dd_link_text" data-analytics='{"category":"navigation","action":"open","label":"language selection menu"}'>
English </a><span class="icon_10 icon_nav_top_right ui-icon-triangle-1-s"></span>
</div>
<br />
<div id="nav_top_right_language_dd_container">
<div id="nav_top_right_language_dd_content">
<div tabindex="1010" class="language_option cdm_selected_language" lang="en_US" data-analytics='{"category":"navigation","action":"click","label":"language: English"}'>English</div>
<div tabindex="1011" class="language_option " lang="de" data-analytics='{"category":"navigation","action":"click","label":"language: Deutsch"}'>Deutsch</div>
<div tabindex="1012" class="language_option " lang="es" data-analytics='{"category":"navigation","action":"click","label":"language: Español"}'>Español</div>
<div tabindex="1013" class="language_option " lang="en_PIRATE" data-analytics='{"category":"navigation","action":"click","label":"language: Pirate English"}'>Pirate English</div>
<div tabindex="1014" class="language_option " lang="ko" data-analytics='{"category":"navigation","action":"click","label":"language: 한국어 Korean"}'>한국어 Korean</div>
<div tabindex="1015" class="language_option " lang="fr" data-analytics='{"category":"navigation","action":"click","label":"language: Français"}'>Français</div>
</div>
<span class="clear"></span>
</div>
</li>
</ul>
</div>
</div>
<!-- BEGIN TOP CONTENT -->
<div id="top_content">
<div style="height:400px;width:500px;margin:0 auto;" valign="top">
<div id="cdm_error" style="height:24px;width:500px;" class="float_left spacePad10 spaceMar30T ui-state-error ui-corner-all">
<span class="icon_10 ui-icon-alert ui-icon-alert-cdmerror"></span>
404: Page not found </div>
</div> </div>
<!-- END TOP CONTENT -->
<!-- FOOTER -->
<span class="clear"></span>
<div id="cdmFooterWrapper" class="spaceMar20T">
<div id="nav_footer">
<div id="nav_footer_left">
<ul class="nav">
<li class="nav_footer_li"><a href="/cdm/">Home</a></li>
<li class="nav_footer_left_divider">|</li>
<li class="nav_footer_li"><a href="/cdm/about">About</a></li>
<li class="nav_footer_left_divider">|</li>
<li class="nav_footer_li"><a href="mailto:digi@athabascau.ca">Contact us</a></li>
</ul>
</div>
<div id="nav_footer_right"><ul class="nav">
<li class="nav_footer_li"><a href="http://www.contentdm.org/" data-analytics='{"category":"navigation","action":"click","label":"Powered by CONTENTdm® link"}'>Powered by CONTENTdm®</a></li></ul>
</div>
<br /><br />
</div>
<span class="clear"></span>
</div>
<div id="login_dialog" title="Login" dialog_name="login_dialog"></div>
<span class="clear"></span>
<div id="content_footer"></div>
<!-- language fields -->
<input type="hidden" id="cdm_language_and" value="and" />
<input type="hidden" id="cdm_language_or" value="or" />
<input type="hidden" id="cdm_language_in" value="in" />
<input type="hidden" id="cdm_language_advancedsearch" value="Advanced Search" />
<input type="hidden" id="cdm_language_closeadvancedsearch" value="Close Advanced Search" />
<input type="hidden" id="cdm_language_allofthewords" value="All of the words" />
<input type="hidden" id="cdm_language_anyofthewords" value="Any of the words" />
<input type="hidden" id="cdm_language_noneofthewords" value="None of the words" />
<input type="hidden" id="cdm_language_theexactphrase" value="The exact phrase" />
<input type="hidden" id="cdm_language_allfields" value="All fields" />
<input type="hidden" id="cdm_language_error_enterAWordOrPhrase" value="Enter a word or phrase" />
<input type="hidden" id="cdm_language_addorremovecollections" value="Add or remove collections" />
<input type="hidden" id="cdm_language_limitsearchtospecificcollections" value="Limit search to specific collections" />
<input type="hidden" id="cdm_language_failedtoretrieveitem" value="Failed to retrieve the item." />
<input type="hidden" id="cdm_language_therewasaproblemrefreshingtheimage" value="therewasaproblemrefreshingtheimage" />
<input type="hidden" id="cdm_language_close" value="Close" />
<input type="hidden" id="cdm_language_login" value="Log in" />
<input type="hidden" id="cdm_language_logout" value="Log out" />
<input type="hidden" id="cdm_language_username" value="User Name" />
<input type="hidden" id="cdm_language_password" value="Password" />
<input type="hidden" id="cdm_language_cancel" value="Cancel" />
<input type="hidden" id="cdm_language_ok" value="OK" />
<input type="hidden" id="cdm_language_authenticating" value="Authenticating" />
<input type="hidden" id="cdm_language_loading" value="loading..." />
<input type="hidden" id="cdm_language_allCollections" value="All Collections" />
<input type="hidden" id="cdm_language_remove" value="remove" />
<input type="hidden" id="cdm_language_plus" value="Plus" />
<input type="hidden" id="cdm_language_more" value="more" />
<input type="hidden" id="cdm_language_foundindocument" value="found in document" />
<input type="hidden" id="cdm_language_for" value="for" />
<input type="hidden" id="cdm_language_error_nousernameentered" value="Please enter a user name." />
<input type="hidden" id="cdm_language_error_nopasswordentered" value="Please enter a password" />
<input type="hidden" id="cdm_language_error_authenticationfailed" value="Authentication Failed\nThe user name and/or password is not recognized.\nPlease check the spelling and try again." />
<!-- end language fields -->
</body>
</html>
You need the 'utils' subdirectory. Try:
curl http://digicon.athabascau.ca/utils/getdownloaditem/collection/auarchives/id/499/type/compoundobject/show/1/cpdtype/document-pdf/filename/485.pdf/width/0/height/0/mapsto/pdf/filesize/0/title/Page%201
This is the response:
billg@lib10:~$ curl http://digicon.athabascau.ca/utils/getdownloaditem/collection/auarchives/id/499/type/compoundobject/show/1/cpdtype/document-pdf/filename/485.pdf/width/0/height/0/mapsto/pdf/filesize/0/title/Page%201
<?xml version="1.0"?>
<cpd>
<type>Document</type>
<page>
<pagetitle>Page 1</pagetitle>
<pagefile>485.pdf</pagefile>
<pageptr>484</pageptr>
</page>
<page>
<pagetitle>Page 2</pagetitle>
<pagefile>486.pdf</pagefile>
<pageptr>485</pageptr>
</page>
<page>
<pagetitle>Page 3</pagetitle>
<pagefile>487.pdf</pagefile>
<pageptr>486</pageptr>
</page>
<page>
<pagetitle>Page 4</pagetitle>
<pagefile>488.pdf</pagefile>
<pageptr>487</pageptr>
</page>
<page>
<pagetitle>Page 5</pagetitle>
<pagefile>489.pdf</pagefile>
<pageptr>488</pageptr>
</page>
<page>
<pagetitle>Page 6</pagetitle>
<pagefile>490.pdf</pagefile>
<pageptr>489</pageptr>
</page>
<page>
<pagetitle>Page 7</pagetitle>
<pagefile>491.pdf</pagefile>
<pageptr>490</pageptr>
</page>
<page>
<pagetitle>Page 8</pagetitle>
<pagefile>492.pdf</pagefile>
<pageptr>491</pageptr>
</page>
<page>
<pagetitle>Page 9</pagetitle>
<pagefile>493.pdf</pagefile>
<pageptr>492</pageptr>
</page>
<page>
<pagetitle>Page 10</pagetitle>
<pagefile>494.pdf</pagefile>
<pageptr>493</pageptr>
</page>
<page>
<pagetitle>Page 11</pagetitle>
<pagefile>495.pdf</pagefile>
<pageptr>494</pageptr>
</page>
<page>
<pagetitle>Page 12</pagetitle>
<pagefile>496.pdf</pagefile>
<pageptr>495</pageptr>
</page>
<page>
<pagetitle>Page 13</pagetitle>
<pagefile>497.pdf</pagefile>
<pageptr>496</pageptr>
</page>
<page>
<pagetitle>Page 14</pagetitle>
<pagefile>498.pdf</pagefile>
<pageptr>497</pageptr>
</page>
<page>
<pagetitle>Page 15</pagetitle>
<pagefile>499.pdf</pagefile>
<pageptr>498</pageptr>
</page>
</cpd>
At http://digicon.athabascau.ca/cdm/ref/collection/auarchives/id/499, if I wanted to download the entire document as a single PDF, how would I do that? I don't see a link that will allow me to do that. Is there an admin option that turns off that feature, and if so, do you have it turned off?
I don't see a button allowing to download the entire compound object as a single PDF file and I don't find an option at the backend to turn it on/off. However, for this object: http://digicon.athabascau.ca/cdm/ref/collection/auriver/id/454, it has a download link. But I think it is a single object rather than a compound one.
Correct, that is a single-file object, not a compound.
I think the manipulator has some problems. If I configure it like:
fetchermanipulators[] = "CdmCompound|Document-PDF"
It does not work because the output of the MIK is:
Commencing MIK.
Filtering 2 records through the CdmCompound fetcher manipulator.
==========================================================================================> 100%
Creating 0 Islandora ingest packages. Please be patient.
It just filtered out the two records in the collection. Then, I changed the manipulator like this:
fetchermanipulators[] = "CdmCompound|Document"
because I found the object types are
65,compound,Document
586,compound,Document
It does work but again I get corrupted PDF files because they are indeed XML files.
So I am thinking the manipulators section on this page:https://github.com/MarcusBarnes/mik/wiki/Toolchain:-CONTENTdm-compound-PDFs should not be restricted to
fetchermanipulators[] = "CdmCompound|Document-PDF"
@xing93111 can you test compound PDF documents with MIK as it stood prior to #223 and the work that brought MIK in line with coding standards? Try commit 9c6b8c537f477fd82f20f3c6ba2563fcd30bd7f5. The compound PDF toolchain code at that commit is essentially how it stood when SFU migrated its compound PDFs (as far as the compound PDF document code anyway). You will need to adjust your .ini file to use CdmPhpDocuments
and not 'CdmPdfDocuments` (which is what #223 fixed).
If this works for you, then there is a problem with the current MIK code that we need to fix; if it doesn't, then we need to confirm that your CONTENTdm can produce a single multiplage PDF from single-page PDFs (which we have not done yet) and go from there.
@MarcusBarnes does this seem like a reasonable way of narrowing down the problem?
Does anyone know of another CONTENTdm instance that we can test against?
@mjordan I don't see the class named CdmPhpDocuments on this page: https://github.com/MarcusBarnes/mik/tree/9c6b8c537f477fd82f20f3c6ba2563fcd30bd7f5/src/filegetters. I suppose this is the commit you would like me to pull out the code. If no such class, the command line will definitely fail
@xing93111 You're looking at the current code rather than the code from the earlier commit. In your MIK directory:
git checkout -b CdmPhpDocuments
Then you'll need to git reset --hard 9c6b8c5
This will take you to the earlier commit... Look in src/filegetters to see what the filename is.
I still don't see the class. Here are my commands:
billg@lib10:/data4/test$ git clone https://github.com/MarcusBarnes/mik.git
Cloning into 'mik'...
remote: Enumerating objects: 18, done.
remote: Counting objects: 100% (18/18), done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 5254 (delta 6), reused 10 (delta 4), pack-reused 5236
Receiving objects: 100% (5254/5254), 1.47 MiB | 0 bytes/s, done.
Resolving deltas: 100% (3468/3468), done.
Checking connectivity... done.
billg@lib10:/data4/test$ ls
mik
billg@lib10:/data4/test$ cd mik
billg@lib10:/data4/test/mik$ ls
composer.json CONTRIBUTING.md LICENSE phpunit.xml.dist src
composer.lock extras mik README.md tests
billg@lib10:/data4/test/mik$ git checkout -b CdmPhpDocuments
Switched to a new branch 'CdmPhpDocuments'
billg@lib10:/data4/test/mik$
billg@lib10:/data4/test/mik$ git reset --hard 9c6b8c5
HEAD is now at 9c6b8c5 Work on #397.
billg@lib10:/data4/test/mik$ ls
composer.json composer.lock CONTRIBUTING.md extras LICENSE mik README.md src tests
billg@lib10:/data4/test/mik$ cd src
billg@lib10:/data4/test/mik/src$ ls
config fetchers filemanipulators metadataparsers
exceptions filegettermanipulators inputvalidators utilities
fetchermanipulators filegetters metadatamanipulators writers
billg@lib10:/data4/test/mik/src$ cd filegetters
billg@lib10:/data4/test/mik/src/filegetters$ ls
CdmBooks.php CdmPdfDocuments.php CsvCompound.php FileGetter.php OaipmhXpath.php
CdmCompound.php CdmSingleFile.php CsvNewspapers.php OaipmhIslandoraObj.php
CdmNewspapers.php CsvBooks.php CsvSingleFile.php OaipmhOjsPdf.php
billg@lib10:/data4/test/mik/src/filegetters$
Anything wrong?
I gave you the wrong commit hash. Try b6b8f0a280509cdae4ff11324c99ef14ffad8781
, that puts the old filegetter back.
@mjordan It seems vendor
folder missed in this version of the code. Here is the output:
billg@lib10:/data4/projects/arca$ ./mik/mik -c ./collections/AUebooks/config.ini
PHP Warning: require(vendor/autoload.php): failed to open stream: No such file or directory in /data4/projects/arca/mik/mik on line 10
PHP Fatal error: require(): Failed opening required 'vendor/autoload.php' (include_path='.:/usr/share/php') in /data4/projects/arca/mik/mik on line 10
billg@lib10:/data4/projects/arca$ cd mik
billg@lib10:/data4/projects/arca/mik$ ls
composer.json CONTRIBUTING.md LICENSE README_DEV.md src
composer.lock extras mik README.md tests
When I check that commit out, vendor
is still there. Did you try running composer update
after you checked out b6b8f0a280509cdae4ff11324c99ef14ffad8781?
Also good to run composer dump-autoload
so that any new classes available via autoloading (after having run composer update to generate the vendor folder with any dependencies, etc.).
@MarcusBarnes got the vendor
folder, but now:
billg@lib10:/data4/projects/arca$ ./mik/mik -c ./collections/AUebooks/config.ini
PHP Fatal error: Uncaught Error: Class 'Commando\Command' not found in /data4/projects/arca/mik/mik:20
Stack trace:
#0 {main}
thrown in /data4/projects/arca/mik/mik on line 20
Do you still get that error after running composer dump-autoload
?
After running composer dump-autoload
, I have the vendor
folder, but was caught by the above error Commando\Command
not found.
What do you see if you run ls vendor/nategood/commando/src/Commando/
from within the mik directory?
@xing93111 Following up on @mjordan comment, double check if it's in your composer.json file (it might have been added after the commit that we're working from). If it's not there, over-write your exiting composer.json file with a copy of the latest composer.json file and then run composer install
, the composer dump-autoload
.
The command line works now @MarcusBarnes. However, it still outputs corrupted PDFs as I mentioned above: https://github.com/MarcusBarnes/mik/issues/492#issuecomment-431150182
@xing93111 we could go back further in time until it works, but I am not convinced that your CONTENTdm is the same as SFU's was. Is there any way we can confirm that it can in fact allow a user to download a single multipage PDF from a compound PDF object?
@xing93111 More specifically... are there any objects where this is the case? And/or, can you try creating compound PDF object in your CDM with the option to download a full one?
If this is just a problem with the Athabasca CDM instance, it may be more productive to close the issue and just use the Automator scripts we discussed to convert the PDF pages to TIFFs.
And/or, can you try creating compound PDF object in your CDM with the option to download a full one?
Sorry, that is exactly what I think we need to confirm before looking closer at the MIK code. If we can confirm that the Athabasca CDM instance can produce multipage PDFs from compound single-page PDF documents, we will have narrowed the issue down to the MIK code, which we can then fix.
On this page: https://github.com/MarcusBarnes/mik/wiki/Toolchain:-CONTENTdm-compound-PDFs I read for compound PDFs,
CdmPhpDocuments
class should be used. However, when I runmik
Then, I went to
mik/src/filegetters
andmik/src/writers
. I found a class namedCdmPdfDocuments
. So I thought maybe there are typos on the document, and changed the class name toCdmPdfDocuments
. However, it still does not work. The output gives corrupted PDFs.This is the collection: http://digicon.athabascau.ca/cdm/landingpage/collection/AUebooks
The following is my configuration ini file: