bcgov / ckanext-bcgov

BC Data Catalogue source code, main ckan extension
http://catalogue.data.gov.bc.ca
GNU Affero General Public License v3.0
24 stars 23 forks source link

PDF preview not working consistently in production #365

Closed Mbrownshoes closed 6 years ago

Mbrownshoes commented 7 years ago

This has been more clearly described in #470

See TESTCASE below further in this issue.

We are unable to preview this draft record's pdf resource (results in sever error) https://catalogue.data.gov.bc.ca/dataset/test-geo-1-2/resource/bf5e743a-4320-40dc-8a68-7bdbb11bf064 however, the download does work. This resource is uploaded to the catalogue, though pdfs linked to have experienced the same problem. We also experienced this problem with https://catalogue.data.gov.bc.ca/dataset/caribou-habitat-model-for-the-eastern-cariboo-region-columbia-highlands-northern-columbia-mountains, so I moved the pdf to the description.

Other pdf previews do work, such as this (which is a resource in the same record as above) https://catalogue.data.gov.bc.ca/dataset/test-geo-1-2/resource/1e928ab5-cdd9-4769-8be7-c84ef6944e5c

This behaviour is inconsistent, and I was able to see pdf previews in cat in cad for newly (and old) created resources.

@ll911 suggests solr might need to be upgraded to fix. @dkelsey

garrettH3S commented 6 years ago

@ll911 This error seems to be caused when the resource is coming from a proxy. pdf resource https://catalogue.data.gov.bc.ca/dataset/caribou-habitat-model-for-the-western-cariboo-region-2001/resource/b8809199-96e9-4415-b8b0-899eb8411118/proxy

screen shot 2018-01-04 at 1 37 03 pm

ll911 commented 6 years ago

this is defined by ckan.resource_proxy.max_file_size, default is 1MB, changing this parm will impact performance, need some justification if need to change the value.

garrettH3S commented 6 years ago

@dkelsey If this is an enhancement, what am i enhancing it to? Is this now a duplicate of #372

dkelsey commented 6 years ago

@garrettH3S It's related yes. The behavior I would expect would be if the PDF is larger than 1048576 bytes no preview is displayed.

dkelsey commented 6 years ago

my comments are wrong. The issues is described correctly blow in the TESTCASE and in #470

I created a dataset to verify this issue:


@ll911 :

@ll911 should ckan.resource_proxy.max_file_size be configure in CAD?

I looked at the current behavior in PROD and forcing a PDF to upload to the DataStore gets the same pdftables in not installed as listed below. I take this to mean we do no store PDFs in the DataStore. PDFs are saved only to the FileStore


1.8 MB file:

This is the behavior is consistent in in CAT.

Error: File "/apps/ckan/tst/datapusher/lib/python2.7/site-packages/apscheduler/scheduler.py", line 512, 
in _run_job retval = job.func(*job.args, **job.kwargs) File "/apps/cis/workspace/bcdc/bcdc-
rc/src/datapusher/datapusher/jobs.py", line 404, in push_to_datastore table_set = 
messytables.any_tableset(tmp, mimetype=ct, extension=ct) File 
"/apps/ckan/tst/datapusher/lib/python2.7/site-packages/messytables/any.py", line 137, in any_tableset 
return parsers[attempt](fileobj, **kw) File "/apps/ckan/tst/datapusher/lib/python2.7/site-
packages/messytables/pdf.py", line 50, in __init__ raise ImportError("pdftables is not installed") 
ImportError('pdftables is not installed',)

In CAD there is another error--

'Could not connect to DataPusher'

18.5 MB File:

This is the behavior in CAT.

Error: Resource too large to process: 10492139 > max (10485760).

In CAD there is another error.

jeff-at-h3 commented 6 years ago

@dkelsey this issue is known to be not working. Dave, do you want us to continue working on it, or do you want to do more testing first?

dkelsey commented 6 years ago

@jeff-at-h3 I realized a while ago that I had the sizes wrong. the max size in 150 MB
I'm testing this today.

dkelsey commented 6 years ago

This is a known issue...not a new one. Lets asses priorities. Hold off working on this for now.

dkelsey commented 6 years ago

This has now been more clearly described in #470

figured out the test case

ENV

CAD, CAT, PROD

TESTCASE