Closed voarsh2 closed 3 years ago
thank you for reporting! could you maybe attach also the logs of joex? thank you! if the file is not confidential, it would help to have it. I know it's usually not possible.
thank you for reporting! could you maybe attach also the logs of joex? thank you! if the file is not confidential, it would help to have it. I know it's usually not possible.
It seemed to process the file eventually, so I am not sure why it failed in the first place.
In any case, here'sa the Joex logs within the last 60 mins:
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:40.319427Z Info 9LBtYg4Jy.../reese/process-item/High: Converting file Some(scan 21-03-12 15-50-54.pdf) (application/pdf) into a PDF
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:40.320338Z Debug 9LBtYg4Jy.../reese/process-item/High: Storing input to file /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile for running ocrmypdf
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:37.827579Z Debug HoG5TrgvY.../reese/process-item/High: NerTag 'Location' is currently not used. Ignoring value '%wandsworth%'.
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:37.827688Z Debug HoG5TrgvY.../reese/process-item/High: Looking for organizations: %sainsbury%
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:40.340392Z Debug 9LBtYg4Jy.../reese/process-item/High: Running external command: ocrmypdf -l eng --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/out.pdf
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:37.881309Z Debug HoG5TrgvY.../reese/process-item/High: Looking up classifier results: List(MetaProposal(CorrOrg,NonEmptyList(Candidate(IdRef(Ident(),Sainsbury’s),Set(),None))))
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:37.883757Z Debug HoG5TrgvY.../reese/process-item/High: Found classifier organization for MetaProposal(CorrOrg,NonEmptyList(Candidate(IdRef(Ident(),Sainsbury’s),Set(),None))): Some(IdRef(Ident(B3e8p8nw4Kj-YsqmDYwKQG8-AB11zbcokww-3XkR8BYEZu),Sainsbury’s))
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:38.289604Z Info HoG5TrgvY.../reese/process-item/High: Storing proposals
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:38.291061Z Debug HoG5TrgvY.../reese/process-item/High: Storing attachment proposals: MetaProposalList(List(MetaProposal(CorrOrg,NonEmptyList(Candidate(IdRef(Ident(B3e8p8nw4Kj-YsqmDYwKQG8-AB11zbcokww-3XkR8BYEZu),Sainsbury’s),Set(NerLabel(Sainsbury,Organization,0,9)),Some(0.0)), Candidate(IdRef(Ident(HwDWnrgLgiu-DCi73Qav2V9-pcKqUFZ4BGg-y7A2TqxVYyD),THE LONDON CYCLE WORKSHOP),Set(NerLabel(shop,Organization,504,508), NerLabel(the,Organization,376,379), NerLabel(London,Organization,95,101)),Some(0.014410314751611681)), Candidate(IdRef(Ident(G9dzjP1xE3B-UdnJ3yF6EVS-mbhziSvG89A-GFyQ4Ev1oJ1),Hoo Hing Ltd),Set(NerLabel(Ltd,Organization,78,81)),Some(0.03549488054607509)), Candidate(IdRef(Ident(BLTeRrAcfJu-U3Evwj6EBv2-zBFSRPFtsrC-6N5mGXjNXVM),Lunaweb Ltd),Set(NerLabel(Ltd,Organization,78,81)),Some(0.03549488054607509)), Candidate(IdRef(Ident(2Z2njWuQUdf-ptg9hfvTDqd-xWWbXyfN6Qy-qN7bJsf7FTq),Mirage Cigarettes Ltd),Set(NerLabel(Ltd,Organization,78,81)),Some(0.03549488054607509)), Candidate(IdRef(Ident(8u8SVnAptTj-M3NtodYkww9-Lyqf1VjiLxs-ZVnRmyhCJu8),Post Office Ltd),Set(NerLabel(Ltd,Organization,78,81)),Some(0.03549488054607509)), Candidate(IdRef(Ident(ATXVuT9GnNM-WxSGpSECa9G-aAeh1TYEGHB-zb5Sed8bmKL),Trinity College London),Set(NerLabel(London,Organization,95,101)),Some(0.04323094425483504)), Candidate(IdRef(Ident(CFPKMNfuJXx-yb856dwws4v-9JPDuy1C23h-rva1hK242mZ),Home Office),Set(NerLabel(HOME,Organization,165,169)),Some(0.07508532423208192)), Candidate(IdRef(Ident(33oASD4SJKd-jRX5U77kjVn-H22vrLvpqn2-9UxehHtQsLX),Toy Shop Limited),Set(NerLabel(shop,Organization,504,508)),Some(0.22935153583617748))))))
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:38.494785Z Debug HoG5TrgvY.../reese/process-item/High: Storing classifier proposals: MetaProposalList(List(MetaProposal(CorrOrg,NonEmptyList(Candidate(IdRef(Ident(B3e8p8nw4Kj-YsqmDYwKQG8-AB11zbcokww-3XkR8BYEZu),Sainsbury’s),Set(),None)))))
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:39.047051Z Info HoG5TrgvY.../reese/process-item/High: Starting linking proposals
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:39.047094Z Info HoG5TrgvY.../reese/process-item/High: Found many (9, NonEmptyList(B3e8p8nw4Kj-YsqmDYwKQG8-AB11zbcokww-3XkR8BYEZu, HwDWnrgLgiu-DCi73Qav2V9-pcKqUFZ4BGg-y7A2TqxVYyD, G9dzjP1xE3B-UdnJ3yF6EVS-mbhziSvG89A-GFyQ4Ev1oJ1, BLTeRrAcfJu-U3Evwj6EBv2-zBFSRPFtsrC-6N5mGXjNXVM, 2Z2njWuQUdf-ptg9hfvTDqd-xWWbXyfN6Qy-qN7bJsf7FTq, 8u8SVnAptTj-M3NtodYkww9-Lyqf1VjiLxs-ZVnRmyhCJu8, ATXVuT9GnNM-WxSGpSECa9G-aAeh1TYEGHB-zb5Sed8bmKL, CFPKMNfuJXx-yb856dwws4v-9JPDuy1C23h-rva1hK242mZ, 33oASD4SJKd-jRX5U77kjVn-H22vrLvpqn2-9UxehHtQsLX)) candidates for CorrOrg. Setting first.
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.047098Z Debug HoG5TrgvY.../reese/process-item/High: Updating item organization with: B3e8p8nw4Kj-YsqmDYwKQG8-AB11zbcokww-3XkR8BYEZu
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.069421Z Debug HoG5TrgvY.../reese/process-item/High: No value for CorrPerson
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.069468Z Debug HoG5TrgvY.../reese/process-item/High: No value for ConcPerson
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.069492Z Debug HoG5TrgvY.../reese/process-item/High: No value for ConcEquip
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.069541Z Debug HoG5TrgvY.../reese/process-item/High: No value for DocDate
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.069578Z Debug HoG5TrgvY.../reese/process-item/High: No value for DueDate
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:39.072570Z Info HoG5TrgvY.../reese/process-item/High: Starting setting given data
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.073531Z Debug HoG5TrgvY.../reese/process-item/High: Set item folder: 'None'
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:39.447413Z Info HoG5TrgvY.../reese/process-item/High: Set tags from given data: List()
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:39.482879Z Info HoG5TrgvY.../reese/process-item/High: Job execution successful
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.507603Z Debug 9LBtYg4Jy.../reese/process-item/High: Command `ocrmypdf -l eng --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/out.pdf` finished: 0
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.509611Z Debug 9LBtYg4Jy.../reese/process-item/High: ocrmypdf stdout:
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.509620Z Debug 9LBtYg4Jy.../reese/process-item/High: ocrmypdf stderr: 1 lots of diacritics - possibly poor OCR
Postprocessing...
Optimize ratio: 1.00 savings: 0.0%
Output file is a PDF/A-2B (as expected)
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:50.509909Z Info 9LBtYg4Jy.../reese/process-item/High: Conversion to pdf successful. Saving file.
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.679532Z Debug 9LBtYg4Jy.../reese/process-item/High: Closing process: `ocrmypdf -l eng --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/out.pdf`
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:50.704035Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text extraction for 1 files
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.704144Z Debug 9LBtYg4Jy.../reese/process-item/High: Extracting text for attachment scan 21-03-12 15-50-54.converted
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.707282Z Debug 9LBtYg4Jy.../reese/process-item/High: Trying to strip text from pdf using pdfbox.
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:50.740160Z Info 9LBtYg4Jy.../reese/process-item/High: Extracting text for attachment scan 21-03-12 15-50-54.converted finished in 36 ms
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.740236Z Debug 9LBtYg4Jy.../reese/process-item/High: Storing extracted texts …
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.782285Z Debug 9LBtYg4Jy.../reese/process-item/High: Extracted text stored.
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.782455Z Debug 9LBtYg4Jy.../reese/process-item/High: Updating SOLR index
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:50.870090Z Info 9LBtYg4Jy.../reese/process-item/High: Text extraction finished in 166 ms.
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:50.870174Z Info 9LBtYg4Jy.../reese/process-item/High: Creating preview images for 1 files…
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:51.257123Z Debug 9LBtYg4Jy.../reese/process-item/High: Preview generated, saving to database…
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:51.365623Z Info 9LBtYg4Jy.../reese/process-item/High: Retrieving page count for 1 files…
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:51.376003Z Debug 9LBtYg4Jy.../reese/process-item/High: Found number of pages: 1
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:51.376025Z Debug 9LBtYg4Jy.../reese/process-item/High: Update attachment Gbivnz2MTzn-tUxHoioS6XK-UynhTxEKDsG-2sTYLjioYrx with page count Some(1)
INFO d.a.n.PipelineCache - Cancel stanford cache clearing, as it has been used in between.
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:51.446576Z Debug 9LBtYg4Jy.../reese/process-item/High: Stored page count (1).
INFO d.j.s.LogSink - >>> 2021-03-12T16:23:51.448267Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text analysis
ERROR d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retrying later.
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
at scala.util.Either.fold(Either.scala:190)
at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
at scala.collection.immutable.List.map(List.scala:246)
at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
at docspell.common.LenientUri$.parse(LenientUri.scala:212)
at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
at fs2.Chunk$Singleton.map(Chunk.scala:534)
at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
at fs2.internal.FreeC.viewL(Algebra.scala:106)
at fs2.internal.FreeC$.go$1(Algebra.scala:421)
at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
DEBUG d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High done Stuck. Releasing resources.
DEBUG d.j.s.SchedulerImpl - Permit released (0 free)
DEBUG d.j.s.SchedulerImpl - New permit acquired
ERROR d.j.s.LogSink - >>> 2021-03-12T16:23:51.834968Z Error 9LBtYg4Jy.../reese/process-item/High: Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retrying later.
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
at scala.util.Either.fold(Either.scala:190)
at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
at scala.collection.immutable.List.map(List.scala:246)
at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
at docspell.common.LenientUri$.parse(LenientUri.scala:212)
at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
at fs2.Chunk$Singleton.map(Chunk.scala:534)
at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
at fs2.internal.FreeC.viewL(Algebra.scala:106)
at fs2.internal.FreeC$.go$1(Algebra.scala:421)
at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
DEBUG d.j.s.SchedulerImpl - Next job found: None
DEBUG d.j.s.SchedulerImpl - Waiting for notify
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
DEBUG d.j.s.SchedulerImpl - Periodic awake reached
DEBUG d.j.s.SchedulerImpl - Notify signal, going into main loop
DEBUG d.j.s.SchedulerImpl - Try to acquire permit (1 free)
DEBUG d.j.s.SchedulerImpl - New permit acquired
DEBUG d.j.s.SchedulerImpl - Next job found: Some(9LBtYg4Jy.../reese/process-item/High)
DEBUG d.j.s.SchedulerImpl - Creating context for job 9LBtYg4Jy.../reese/process-item/High to run JobTask(Ident(process-item),docspell.joex.scheduler.Task$$anonfun$contramap$4@42a8bf9f,docspell.joex.scheduler.Task$$anonfun$contramap$4@143e5e54)
DEBUG d.j.s.SchedulerImpl - Forking job 9LBtYg4Jy.../reese/process-item/High
DEBUG d.j.s.SchedulerImpl - Try to acquire permit (0 free)
DEBUG d.j.s.SchedulerImpl - Starting task now
INFO d.j.s.LogSink - >>> 2021-03-12T16:37:04.976190Z Info 9LBtYg4Jy.../reese/process-item/High: ============ Start processing scan 21-03-12 15-50-54.pdf ============
WARN d.j.s.LogSink - >>> 2021-03-12T16:37:04.986070Z Warn 9LBtYg4Jy.../reese/process-item/High: Found 1 existing item with these files.
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:05.092506Z Debug 9LBtYg4Jy.../reese/process-item/High: Found 1 attachments. Use only those from task args: Set(Ident(4PvKnuVuYnX-Xmd37zFbcmU-67ZKxcy7vSR-HediYtAZSMx))
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:05.149246Z Debug 9LBtYg4Jy.../reese/process-item/High: Not an archive: application/pdf
INFO d.j.s.LogSink - >>> 2021-03-12T16:37:05.205178Z Info 9LBtYg4Jy.../reese/process-item/High: Conversion to pdf already done for attachment Some(scan 21-03-12 15-50-54.converted.pdf).
INFO d.j.s.LogSink - >>> 2021-03-12T16:37:05.229682Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text extraction for 1 files
INFO d.j.s.LogSink - >>> 2021-03-12T16:37:05.230457Z Info 9LBtYg4Jy.../reese/process-item/High: TextExtraction skipped, since text is already available.
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:05.230484Z Debug 9LBtYg4Jy.../reese/process-item/High: Storing extracted texts …
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:05.248529Z Debug 9LBtYg4Jy.../reese/process-item/High: Extracted text stored.
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:05.248734Z Debug 9LBtYg4Jy.../reese/process-item/High: Updating SOLR index
INFO d.j.s.LogSink - >>> 2021-03-12T16:37:06.317930Z Info 9LBtYg4Jy.../reese/process-item/High: Text extraction finished in 1088 ms.
INFO d.j.s.LogSink - >>> 2021-03-12T16:37:06.318071Z Info 9LBtYg4Jy.../reese/process-item/High: Creating preview images for 1 files…
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:06.792179Z Debug 9LBtYg4Jy.../reese/process-item/High: Preview generated, saving to database…
INFO d.j.s.LogSink - >>> 2021-03-12T16:37:06.845173Z Info 9LBtYg4Jy.../reese/process-item/High: Retrieving page count for 1 files…
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:06.859165Z Debug 9LBtYg4Jy.../reese/process-item/High: Found number of pages: 1
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:06.859255Z Debug 9LBtYg4Jy.../reese/process-item/High: Update attachment Gbivnz2MTzn-tUxHoioS6XK-UynhTxEKDsG-2sTYLjioYrx with page count Some(1)
DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:06.860807Z Debug 9LBtYg4Jy.../reese/process-item/High: Stored page count (1).
DEBUG d.j.a.RegexNerFile - Cache time elapsed (Duration(819012ms) > Duration(60000ms)). Check for new state.
DEBUG d.j.a.RegexNerFile - There have been state changes for collective 'reese'. Reload NER file.
INFO d.j.a.RegexNerFile - Generating custom NER file for collective 'reese'
DEBUG d.j.a.RegexNerFile - Writing custom NER file for collective 'reese'
INFO d.j.s.LogSink - >>> 2021-03-12T16:37:06.890754Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text analysis
INFO d.a.n.PipelineCache - Cancel stanford cache clearing, as it has been used in between.
ERROR d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retrying later.
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
at scala.util.Either.fold(Either.scala:190)
at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
at scala.collection.immutable.List.map(List.scala:246)
at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
at docspell.common.LenientUri$.parse(LenientUri.scala:212)
at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
at fs2.Chunk$Singleton.map(Chunk.scala:534)
at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
at fs2.internal.FreeC.viewL(Algebra.scala:106)
at fs2.internal.FreeC$.go$1(Algebra.scala:421)
at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
DEBUG d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High done Stuck. Releasing resources.
DEBUG d.j.s.SchedulerImpl - Permit released (0 free)
DEBUG d.j.s.SchedulerImpl - New permit acquired
ERROR d.j.s.LogSink - >>> 2021-03-12T16:37:07.388676Z Error 9LBtYg4Jy.../reese/process-item/High: Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retrying later.
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
at scala.util.Either.fold(Either.scala:190)
at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
at scala.collection.immutable.List.map(List.scala:246)
at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
at docspell.common.LenientUri$.parse(LenientUri.scala:212)
at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
at fs2.Chunk$Singleton.map(Chunk.scala:534)
at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
at fs2.internal.FreeC.viewL(Algebra.scala:106)
at fs2.internal.FreeC$.go$1(Algebra.scala:421)
at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
DEBUG d.j.s.SchedulerImpl - Next job found: None
DEBUG d.j.s.SchedulerImpl - Waiting for notify
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
INFO d.a.n.PipelineCache - Clearing stanford nlp cache now!
WARN e.s.n.p.StanfordCoreNLP - Clearing CoreNLP annotation pool; this should be unnecessary in production
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
DEBUG d.j.s.SchedulerImpl - Periodic awake reached
DEBUG d.j.s.SchedulerImpl - Notify signal, going into main loop
DEBUG d.j.s.SchedulerImpl - Try to acquire permit (1 free)
DEBUG d.j.s.SchedulerImpl - New permit acquired
DEBUG d.j.s.SchedulerImpl - Next job found: Some(9LBtYg4Jy.../reese/process-item/High)
DEBUG d.j.s.SchedulerImpl - Creating context for job 9LBtYg4Jy.../reese/process-item/High to run JobTask(Ident(process-item),docspell.joex.scheduler.Task$$anonfun$contramap$4@42a8bf9f,docspell.joex.scheduler.Task$$anonfun$contramap$4@143e5e54)
DEBUG d.j.s.SchedulerImpl - Forking job 9LBtYg4Jy.../reese/process-item/High
DEBUG d.j.s.SchedulerImpl - Try to acquire permit (0 free)
DEBUG d.j.s.SchedulerImpl - Starting task now
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.085595Z Info 9LBtYg4Jy.../reese/process-item/High: ============ Start processing scan 21-03-12 15-50-54.pdf ============
WARN d.j.s.LogSink - >>> 2021-03-12T17:07:05.091475Z Warn 9LBtYg4Jy.../reese/process-item/High: Found 1 existing item with these files.
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.093368Z Debug 9LBtYg4Jy.../reese/process-item/High: Found 1 attachments. Use only those from task args: Set(Ident(4PvKnuVuYnX-Xmd37zFbcmU-67ZKxcy7vSR-HediYtAZSMx))
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.273489Z Debug 9LBtYg4Jy.../reese/process-item/High: Not an archive: application/pdf
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.362444Z Info 9LBtYg4Jy.../reese/process-item/High: Conversion to pdf already done for attachment Some(scan 21-03-12 15-50-54.converted.pdf).
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.513455Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text extraction for 1 files
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.513650Z Info 9LBtYg4Jy.../reese/process-item/High: TextExtraction skipped, since text is already available.
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.513685Z Debug 9LBtYg4Jy.../reese/process-item/High: Storing extracted texts …
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.529570Z Debug 9LBtYg4Jy.../reese/process-item/High: Extracted text stored.
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.529670Z Debug 9LBtYg4Jy.../reese/process-item/High: Updating SOLR index
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.722953Z Info 9LBtYg4Jy.../reese/process-item/High: Text extraction finished in 209 ms.
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.723070Z Info 9LBtYg4Jy.../reese/process-item/High: Creating preview images for 1 files…
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.310647Z Debug 9LBtYg4Jy.../reese/process-item/High: Preview generated, saving to database…
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:06.363117Z Info 9LBtYg4Jy.../reese/process-item/High: Retrieving page count for 1 files…
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.376701Z Debug 9LBtYg4Jy.../reese/process-item/High: Found number of pages: 1
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.376723Z Debug 9LBtYg4Jy.../reese/process-item/High: Update attachment Gbivnz2MTzn-tUxHoioS6XK-UynhTxEKDsG-2sTYLjioYrx with page count Some(1)
DEBUG d.j.a.RegexNerFile - Cache time elapsed (Duration(1799509ms) > Duration(60000ms)). Check for new state.
DEBUG d.j.a.RegexNerFile - There have been state changes for collective 'reese'. Reload NER file.
INFO d.j.a.RegexNerFile - Generating custom NER file for collective 'reese'
DEBUG d.j.a.RegexNerFile - Writing custom NER file for collective 'reese'
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.398411Z Debug 9LBtYg4Jy.../reese/process-item/High: Stored page count (1).
INFO d.a.n.PipelineCache - Cancel stanford cache clearing, as it has been used in between.
INFO d.a.n.StanfordNerAnnotator - Creating english Stanford NLP NER classifier...
INFO e.s.n.p.StanfordCoreNLP - Adding annotator tokenize
INFO e.s.n.p.StanfordCoreNLP - Adding annotator ssplit
INFO e.s.n.p.StanfordCoreNLP - Adding annotator pos
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:06.399522Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text analysis
INFO e.s.n.t.m.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done .
INFO e.s.n.p.StanfordCoreNLP - Adding annotator lemma
INFO e.s.n.p.StanfordCoreNLP - Adding annotator ner
INFO e.s.n.i.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done .
INFO e.s.n.p.NERCombinerAnnotator - numeric classifiers: false; SUTime: false ; fine grained: false
INFO e.s.n.p.StanfordCoreNLP - Adding annotator regexner
INFO e.s.n.p.TokensRegexNERAnnotator - regexner: Read 156 unique entries out of 156 from /tmp/docspell-analysis/regexner-10380637799222175923/reese.txt, 0 TokensRegex patterns.
WARN d.j.s.LogSink - >>> 2021-03-12T17:07:09.127290Z Warn 9LBtYg4Jy.../reese/process-item/High: Processing failed on last retry. Creating item but without proposals.
ERROR d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retries exceeded.
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
at scala.util.Either.fold(Either.scala:190)
at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
at scala.collection.immutable.List.map(List.scala:246)
at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
at docspell.common.LenientUri$.parse(LenientUri.scala:212)
at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
at fs2.Chunk$Singleton.map(Chunk.scala:534)
at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
at fs2.internal.FreeC.viewL(Algebra.scala:106)
at fs2.internal.FreeC$.go$1(Algebra.scala:421)
at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
DEBUG d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High done Failed. Releasing resources.
DEBUG d.j.s.SchedulerImpl - Permit released (0 free)
DEBUG d.j.s.SchedulerImpl - New permit acquired
ERROR d.j.s.LogSink - >>> 2021-03-12T17:07:09.133759Z Error 9LBtYg4Jy.../reese/process-item/High: Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retries exceeded.
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
at scala.util.Either.fold(Either.scala:190)
at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
at scala.collection.immutable.List.map(List.scala:246)
at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
at docspell.common.LenientUri$.parse(LenientUri.scala:212)
at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
at fs2.Chunk$Singleton.map(Chunk.scala:534)
at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
at fs2.internal.FreeC.viewL(Algebra.scala:106)
at fs2.internal.FreeC$.go$1(Algebra.scala:421)
at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
DEBUG d.j.s.SchedulerImpl - Next job found: None
DEBUG d.j.s.SchedulerImpl - Waiting for notify
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
INFO d.a.n.PipelineCache - Clearing stanford nlp cache now!
WARN e.s.n.p.StanfordCoreNLP - Clearing CoreNLP annotation pool; this should be unnecessary in production
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
DEBUG d.j.s.SchedulerImpl - Periodic awake reached
DEBUG d.j.s.SchedulerImpl - Notify signal, going into main loop
DEBUG d.j.s.SchedulerImpl - Try to acquire permit (1 free)
DEBUG d.j.s.SchedulerImpl - New permit acquired
DEBUG d.j.s.SchedulerImpl - Next job found: None
DEBUG d.j.s.SchedulerImpl - Waiting for notify
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
root@DOCSPELL:~# ^C
root@DOCSPELL:~# docker logs 6748cb283598 -since 60m
unknown shorthand flag: 's' in -since
See 'docker logs --help'.
root@DOCSPELL:~# docker logs -since 60m 6748cb283598
unknown shorthand flag: 's' in -since
See 'docker logs --help'.
root@DOCSPELL:~# docker logs 6748cb283598 --since 60m
INFO d.a.n.PipelineCache - Clearing stanford nlp cache now!
WARN e.s.n.p.StanfordCoreNLP - Clearing CoreNLP annotation pool; this should be unnecessary in production
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
DEBUG d.j.s.SchedulerImpl - Periodic awake reached
DEBUG d.j.s.SchedulerImpl - Notify signal, going into main loop
DEBUG d.j.s.SchedulerImpl - Try to acquire permit (1 free)
DEBUG d.j.s.SchedulerImpl - New permit acquired
DEBUG d.j.s.SchedulerImpl - Next job found: Some(9LBtYg4Jy.../reese/process-item/High)
DEBUG d.j.s.SchedulerImpl - Creating context for job 9LBtYg4Jy.../reese/process-item/High to run JobTask(Ident(process-item),docspell.joex.scheduler.Task$$anonfun$contramap$4@42a8bf9f,docspell.joex.scheduler.Task$$anonfun$contramap$4@143e5e54)
DEBUG d.j.s.SchedulerImpl - Forking job 9LBtYg4Jy.../reese/process-item/High
DEBUG d.j.s.SchedulerImpl - Try to acquire permit (0 free)
DEBUG d.j.s.SchedulerImpl - Starting task now
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.085595Z Info 9LBtYg4Jy.../reese/process-item/High: ============ Start processing scan 21-03-12 15-50-54.pdf ============
WARN d.j.s.LogSink - >>> 2021-03-12T17:07:05.091475Z Warn 9LBtYg4Jy.../reese/process-item/High: Found 1 existing item with these files.
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.093368Z Debug 9LBtYg4Jy.../reese/process-item/High: Found 1 attachments. Use only those from task args: Set(Ident(4PvKnuVuYnX-Xmd37zFbcmU-67ZKxcy7vSR-HediYtAZSMx))
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.273489Z Debug 9LBtYg4Jy.../reese/process-item/High: Not an archive: application/pdf
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.362444Z Info 9LBtYg4Jy.../reese/process-item/High: Conversion to pdf already done for attachment Some(scan 21-03-12 15-50-54.converted.pdf).
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.513455Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text extraction for 1 files
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.513650Z Info 9LBtYg4Jy.../reese/process-item/High: TextExtraction skipped, since text is already available.
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.513685Z Debug 9LBtYg4Jy.../reese/process-item/High: Storing extracted texts …
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.529570Z Debug 9LBtYg4Jy.../reese/process-item/High: Extracted text stored.
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.529670Z Debug 9LBtYg4Jy.../reese/process-item/High: Updating SOLR index
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.722953Z Info 9LBtYg4Jy.../reese/process-item/High: Text extraction finished in 209 ms.
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:05.723070Z Info 9LBtYg4Jy.../reese/process-item/High: Creating preview images for 1 files…
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.310647Z Debug 9LBtYg4Jy.../reese/process-item/High: Preview generated, saving to database…
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:06.363117Z Info 9LBtYg4Jy.../reese/process-item/High: Retrieving page count for 1 files…
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.376701Z Debug 9LBtYg4Jy.../reese/process-item/High: Found number of pages: 1
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.376723Z Debug 9LBtYg4Jy.../reese/process-item/High: Update attachment Gbivnz2MTzn-tUxHoioS6XK-UynhTxEKDsG-2sTYLjioYrx with page count Some(1)
DEBUG d.j.a.RegexNerFile - Cache time elapsed (Duration(1799509ms) > Duration(60000ms)). Check for new state.
DEBUG d.j.a.RegexNerFile - There have been state changes for collective 'reese'. Reload NER file.
INFO d.j.a.RegexNerFile - Generating custom NER file for collective 'reese'
DEBUG d.j.a.RegexNerFile - Writing custom NER file for collective 'reese'
DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.398411Z Debug 9LBtYg4Jy.../reese/process-item/High: Stored page count (1).
INFO d.a.n.PipelineCache - Cancel stanford cache clearing, as it has been used in between.
INFO d.a.n.StanfordNerAnnotator - Creating english Stanford NLP NER classifier...
INFO e.s.n.p.StanfordCoreNLP - Adding annotator tokenize
INFO e.s.n.p.StanfordCoreNLP - Adding annotator ssplit
INFO e.s.n.p.StanfordCoreNLP - Adding annotator pos
INFO d.j.s.LogSink - >>> 2021-03-12T17:07:06.399522Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text analysis
INFO e.s.n.t.m.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done .
INFO e.s.n.p.StanfordCoreNLP - Adding annotator lemma
INFO e.s.n.p.StanfordCoreNLP - Adding annotator ner
INFO e.s.n.i.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done .
INFO e.s.n.p.NERCombinerAnnotator - numeric classifiers: false; SUTime: false ; fine grained: false
INFO e.s.n.p.StanfordCoreNLP - Adding annotator regexner
INFO e.s.n.p.TokensRegexNERAnnotator - regexner: Read 156 unique entries out of 156 from /tmp/docspell-analysis/regexner-10380637799222175923/reese.txt, 0 TokensRegex patterns.
WARN d.j.s.LogSink - >>> 2021-03-12T17:07:09.127290Z Warn 9LBtYg4Jy.../reese/process-item/High: Processing failed on last retry. Creating item but without proposals.
ERROR d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retries exceeded.
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
at scala.util.Either.fold(Either.scala:190)
at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
at scala.collection.immutable.List.map(List.scala:246)
at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
at docspell.common.LenientUri$.parse(LenientUri.scala:212)
at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
at fs2.Chunk$Singleton.map(Chunk.scala:534)
at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
at fs2.internal.FreeC.viewL(Algebra.scala:106)
at fs2.internal.FreeC$.go$1(Algebra.scala:421)
at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
DEBUG d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High done Failed. Releasing resources.
DEBUG d.j.s.SchedulerImpl - Permit released (0 free)
DEBUG d.j.s.SchedulerImpl - New permit acquired
ERROR d.j.s.LogSink - >>> 2021-03-12T17:07:09.133759Z Error 9LBtYg4Jy.../reese/process-item/High: Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retries exceeded.
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
at scala.util.Either.fold(Either.scala:190)
at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
at scala.collection.immutable.List.map(List.scala:246)
at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
at docspell.common.LenientUri$.parse(LenientUri.scala:212)
at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
at fs2.Chunk$Singleton.map(Chunk.scala:534)
at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
at fs2.internal.FreeC.viewL(Algebra.scala:106)
at fs2.internal.FreeC$.go$1(Algebra.scala:421)
at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
DEBUG d.j.s.SchedulerImpl - Next job found: None
DEBUG d.j.s.SchedulerImpl - Waiting for notify
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
INFO d.a.n.PipelineCache - Clearing stanford nlp cache now!
WARN e.s.n.p.StanfordCoreNLP - Clearing CoreNLP annotation pool; this should be unnecessary in production
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
DEBUG d.j.s.SchedulerImpl - Periodic awake reached
DEBUG d.j.s.SchedulerImpl - Notify signal, going into main loop
DEBUG d.j.s.SchedulerImpl - Try to acquire permit (1 free)
DEBUG d.j.s.SchedulerImpl - New permit acquired
DEBUG d.j.s.SchedulerImpl - Next job found: None
DEBUG d.j.s.SchedulerImpl - Waiting for notify
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached
The file in question is attached too. It wasn't confidential in this case. scan 21-03-12 15-50-54.pdf
Thank you very much!
This was a quite embarrassing bug. Thank you for your patients and feedback! It is fixed now on master and will be in the next release.
Docspell seems unable to process my scan. It is now in stuck state as well.