eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.57k stars 117 forks source link

Invalid hexadecimal character 'x' at index 0 #709

Closed voarsh2 closed 3 years ago

voarsh2 commented 3 years ago

Docspell seems unable to process my scan. It is now in stuck state as well.

2021-03-12T16:23:39: ============ Start processing scan 21-03-12 15-50-54.pdf ============
2021-03-12T16:23:39: Checking for duplicate files
2021-03-12T16:23:39: Creating new item with 1 attachment(s)
2021-03-12T16:23:40: Creating item finished in 293 ms
2021-03-12T16:23:40: Not an archive: application/pdf
2021-03-12T16:23:40: Converting file Some(scan 21-03-12 15-50-54.pdf) (application/pdf) into a PDF
2021-03-12T16:23:40: Storing input to file /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile for running ocrmypdf
2021-03-12T16:23:40: Running external command: ocrmypdf -l eng --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/out.pdf
2021-03-12T16:23:50: Command `ocrmypdf -l eng --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/out.pdf` finished: 0
2021-03-12T16:23:50: ocrmypdf stdout:
2021-03-12T16:23:50: ocrmypdf stderr: 1  lots of diacritics - possibly poor OCR Postprocessing... Optimize ratio: 1.00 savings: 0.0% Output file is a PDF/A-2B (as expected)
2021-03-12T16:23:50: Conversion to pdf successful. Saving file.
2021-03-12T16:23:50: Closing process: `ocrmypdf -l eng --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/out.pdf`
2021-03-12T16:23:50: Starting text extraction for 1 files
2021-03-12T16:23:50: Extracting text for attachment scan 21-03-12 15-50-54.converted
2021-03-12T16:23:50: Trying to strip text from pdf using pdfbox.
2021-03-12T16:23:50: Extracting text for attachment scan 21-03-12 15-50-54.converted finished in 36 ms
2021-03-12T16:23:50: Storing extracted texts …
2021-03-12T16:23:50: Extracted text stored.
2021-03-12T16:23:50: Updating SOLR index
2021-03-12T16:23:50: Text extraction finished in 166 ms.
2021-03-12T16:23:50: Creating preview images for 1 files…
2021-03-12T16:23:51: Preview generated, saving to database…
2021-03-12T16:23:51: Retrieving page count for 1 files…
2021-03-12T16:23:51: Found number of pages: 1
2021-03-12T16:23:51: Update attachment Gbivnz2MTzn-tUxHoioS6XK-UynhTxEKDsG-2sTYLjioYrx with page count Some(1)
2021-03-12T16:23:51: Stored page count (1).
2021-03-12T16:23:51: Starting text analysis
2021-03-12T16:23:51: Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retrying later.: Invalid hexadecimal character 'x' at index 0
eikek commented 3 years ago

thank you for reporting! could you maybe attach also the logs of joex? thank you! if the file is not confidential, it would help to have it. I know it's usually not possible.

voarsh2 commented 3 years ago

thank you for reporting! could you maybe attach also the logs of joex? thank you! if the file is not confidential, it would help to have it. I know it's usually not possible.

It seemed to process the file eventually, so I am not sure why it failed in the first place.

In any case, here'sa the Joex logs within the last 60 mins:

 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:40.319427Z Info 9LBtYg4Jy.../reese/process-item/High: Converting file Some(scan 21-03-12 15-50-54.pdf) (application/pdf) into a PDF 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:40.320338Z Debug 9LBtYg4Jy.../reese/process-item/High: Storing input to file /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile for running ocrmypdf 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:37.827579Z Debug HoG5TrgvY.../reese/process-item/High: NerTag 'Location' is currently not used. Ignoring value '%wandsworth%'. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:37.827688Z Debug HoG5TrgvY.../reese/process-item/High: Looking for organizations: %sainsbury% 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:40.340392Z Debug 9LBtYg4Jy.../reese/process-item/High: Running external command: ocrmypdf -l eng --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/out.pdf 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:37.881309Z Debug HoG5TrgvY.../reese/process-item/High: Looking up classifier results: List(MetaProposal(CorrOrg,NonEmptyList(Candidate(IdRef(Ident(),Sainsbury’s),Set(),None)))) 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:37.883757Z Debug HoG5TrgvY.../reese/process-item/High: Found classifier organization for MetaProposal(CorrOrg,NonEmptyList(Candidate(IdRef(Ident(),Sainsbury’s),Set(),None))): Some(IdRef(Ident(B3e8p8nw4Kj-YsqmDYwKQG8-AB11zbcokww-3XkR8BYEZu),Sainsbury’s)) 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:38.289604Z Info HoG5TrgvY.../reese/process-item/High: Storing proposals 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:38.291061Z Debug HoG5TrgvY.../reese/process-item/High: Storing attachment proposals: MetaProposalList(List(MetaProposal(CorrOrg,NonEmptyList(Candidate(IdRef(Ident(B3e8p8nw4Kj-YsqmDYwKQG8-AB11zbcokww-3XkR8BYEZu),Sainsbury’s),Set(NerLabel(Sainsbury,Organization,0,9)),Some(0.0)), Candidate(IdRef(Ident(HwDWnrgLgiu-DCi73Qav2V9-pcKqUFZ4BGg-y7A2TqxVYyD),THE LONDON CYCLE WORKSHOP),Set(NerLabel(shop,Organization,504,508), NerLabel(the,Organization,376,379), NerLabel(London,Organization,95,101)),Some(0.014410314751611681)), Candidate(IdRef(Ident(G9dzjP1xE3B-UdnJ3yF6EVS-mbhziSvG89A-GFyQ4Ev1oJ1),Hoo Hing Ltd),Set(NerLabel(Ltd,Organization,78,81)),Some(0.03549488054607509)), Candidate(IdRef(Ident(BLTeRrAcfJu-U3Evwj6EBv2-zBFSRPFtsrC-6N5mGXjNXVM),Lunaweb Ltd),Set(NerLabel(Ltd,Organization,78,81)),Some(0.03549488054607509)), Candidate(IdRef(Ident(2Z2njWuQUdf-ptg9hfvTDqd-xWWbXyfN6Qy-qN7bJsf7FTq),Mirage Cigarettes Ltd),Set(NerLabel(Ltd,Organization,78,81)),Some(0.03549488054607509)), Candidate(IdRef(Ident(8u8SVnAptTj-M3NtodYkww9-Lyqf1VjiLxs-ZVnRmyhCJu8),Post Office Ltd),Set(NerLabel(Ltd,Organization,78,81)),Some(0.03549488054607509)), Candidate(IdRef(Ident(ATXVuT9GnNM-WxSGpSECa9G-aAeh1TYEGHB-zb5Sed8bmKL),Trinity College London),Set(NerLabel(London,Organization,95,101)),Some(0.04323094425483504)), Candidate(IdRef(Ident(CFPKMNfuJXx-yb856dwws4v-9JPDuy1C23h-rva1hK242mZ),Home Office),Set(NerLabel(HOME,Organization,165,169)),Some(0.07508532423208192)), Candidate(IdRef(Ident(33oASD4SJKd-jRX5U77kjVn-H22vrLvpqn2-9UxehHtQsLX),Toy Shop Limited),Set(NerLabel(shop,Organization,504,508)),Some(0.22935153583617748)))))) 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:38.494785Z Debug HoG5TrgvY.../reese/process-item/High: Storing classifier proposals: MetaProposalList(List(MetaProposal(CorrOrg,NonEmptyList(Candidate(IdRef(Ident(B3e8p8nw4Kj-YsqmDYwKQG8-AB11zbcokww-3XkR8BYEZu),Sainsbury’s),Set(),None))))) 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:39.047051Z Info HoG5TrgvY.../reese/process-item/High: Starting linking proposals 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:39.047094Z Info HoG5TrgvY.../reese/process-item/High: Found many (9, NonEmptyList(B3e8p8nw4Kj-YsqmDYwKQG8-AB11zbcokww-3XkR8BYEZu, HwDWnrgLgiu-DCi73Qav2V9-pcKqUFZ4BGg-y7A2TqxVYyD, G9dzjP1xE3B-UdnJ3yF6EVS-mbhziSvG89A-GFyQ4Ev1oJ1, BLTeRrAcfJu-U3Evwj6EBv2-zBFSRPFtsrC-6N5mGXjNXVM, 2Z2njWuQUdf-ptg9hfvTDqd-xWWbXyfN6Qy-qN7bJsf7FTq, 8u8SVnAptTj-M3NtodYkww9-Lyqf1VjiLxs-ZVnRmyhCJu8, ATXVuT9GnNM-WxSGpSECa9G-aAeh1TYEGHB-zb5Sed8bmKL, CFPKMNfuJXx-yb856dwws4v-9JPDuy1C23h-rva1hK242mZ, 33oASD4SJKd-jRX5U77kjVn-H22vrLvpqn2-9UxehHtQsLX)) candidates for CorrOrg. Setting first. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.047098Z Debug HoG5TrgvY.../reese/process-item/High: Updating item organization with: B3e8p8nw4Kj-YsqmDYwKQG8-AB11zbcokww-3XkR8BYEZu 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.069421Z Debug HoG5TrgvY.../reese/process-item/High: No value for CorrPerson 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.069468Z Debug HoG5TrgvY.../reese/process-item/High: No value for ConcPerson 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.069492Z Debug HoG5TrgvY.../reese/process-item/High: No value for ConcEquip 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.069541Z Debug HoG5TrgvY.../reese/process-item/High: No value for DocDate 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.069578Z Debug HoG5TrgvY.../reese/process-item/High: No value for DueDate 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:39.072570Z Info HoG5TrgvY.../reese/process-item/High: Starting setting given data 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:39.073531Z Debug HoG5TrgvY.../reese/process-item/High: Set item folder: 'None' 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:39.447413Z Info HoG5TrgvY.../reese/process-item/High: Set tags from given data: List() 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:39.482879Z Info HoG5TrgvY.../reese/process-item/High: Job execution successful 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.507603Z Debug 9LBtYg4Jy.../reese/process-item/High: Command `ocrmypdf -l eng --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/out.pdf` finished: 0 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.509611Z Debug 9LBtYg4Jy.../reese/process-item/High: ocrmypdf stdout:  
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.509620Z Debug 9LBtYg4Jy.../reese/process-item/High: ocrmypdf stderr:     1  lots of diacritics - possibly poor OCR
Postprocessing...
Optimize ratio: 1.00 savings: 0.0%
Output file is a PDF/A-2B (as expected)

 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:50.509909Z Info 9LBtYg4Jy.../reese/process-item/High: Conversion to pdf successful. Saving file. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.679532Z Debug 9LBtYg4Jy.../reese/process-item/High: Closing process: `ocrmypdf -l eng --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/infile /tmp/docspell-convert/docspell-ocrmypdf17971578592602581578/out.pdf` 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:50.704035Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text extraction for 1 files 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.704144Z Debug 9LBtYg4Jy.../reese/process-item/High: Extracting text for attachment scan 21-03-12 15-50-54.converted 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.707282Z Debug 9LBtYg4Jy.../reese/process-item/High: Trying to strip text from pdf using pdfbox. 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:50.740160Z Info 9LBtYg4Jy.../reese/process-item/High: Extracting text for attachment scan 21-03-12 15-50-54.converted finished in 36 ms 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.740236Z Debug 9LBtYg4Jy.../reese/process-item/High: Storing extracted texts … 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.782285Z Debug 9LBtYg4Jy.../reese/process-item/High: Extracted text stored. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:50.782455Z Debug 9LBtYg4Jy.../reese/process-item/High: Updating SOLR index 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:50.870090Z Info 9LBtYg4Jy.../reese/process-item/High: Text extraction finished in 166 ms. 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:50.870174Z Info 9LBtYg4Jy.../reese/process-item/High: Creating preview images for 1 files… 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:51.257123Z Debug 9LBtYg4Jy.../reese/process-item/High: Preview generated, saving to database… 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:51.365623Z Info 9LBtYg4Jy.../reese/process-item/High: Retrieving page count for 1 files… 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:51.376003Z Debug 9LBtYg4Jy.../reese/process-item/High: Found number of pages: 1 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:51.376025Z Debug 9LBtYg4Jy.../reese/process-item/High: Update attachment Gbivnz2MTzn-tUxHoioS6XK-UynhTxEKDsG-2sTYLjioYrx with page count Some(1) 
 INFO  d.a.n.PipelineCache - Cancel stanford cache clearing, as it has been used in between. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:23:51.446576Z Debug 9LBtYg4Jy.../reese/process-item/High: Stored page count (1). 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:23:51.448267Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text analysis 
 ERROR d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retrying later. 
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
        at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
        at scala.util.Either.fold(Either.scala:190)
        at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
        at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
        at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
        at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
        at scala.collection.immutable.List.map(List.scala:246)
        at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
        at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
        at docspell.common.LenientUri$.parse(LenientUri.scala:212)
        at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
        at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
        at fs2.Chunk$Singleton.map(Chunk.scala:534)
        at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
        at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
        at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
        at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
        at fs2.internal.FreeC.viewL(Algebra.scala:106)
        at fs2.internal.FreeC$.go$1(Algebra.scala:421)
        at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
        at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
 DEBUG d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High done Stuck. Releasing resources. 
 DEBUG d.j.s.SchedulerImpl - Permit released (0 free) 
 DEBUG d.j.s.SchedulerImpl - New permit acquired 
 ERROR d.j.s.LogSink - >>> 2021-03-12T16:23:51.834968Z Error 9LBtYg4Jy.../reese/process-item/High: Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retrying later. 
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
        at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
        at scala.util.Either.fold(Either.scala:190)
        at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
        at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
        at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
        at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
        at scala.collection.immutable.List.map(List.scala:246)
        at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
        at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
        at docspell.common.LenientUri$.parse(LenientUri.scala:212)
        at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
        at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
        at fs2.Chunk$Singleton.map(Chunk.scala:534)
        at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
        at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
        at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
        at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
        at fs2.internal.FreeC.viewL(Algebra.scala:106)
        at fs2.internal.FreeC$.go$1(Algebra.scala:421)
        at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
        at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
 DEBUG d.j.s.SchedulerImpl - Next job found: None 
 DEBUG d.j.s.SchedulerImpl - Waiting for notify 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.SchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.SchedulerImpl - Notify signal, going into main loop 
 DEBUG d.j.s.SchedulerImpl - Try to acquire permit (1 free) 
 DEBUG d.j.s.SchedulerImpl - New permit acquired 
 DEBUG d.j.s.SchedulerImpl - Next job found: Some(9LBtYg4Jy.../reese/process-item/High) 
 DEBUG d.j.s.SchedulerImpl - Creating context for job 9LBtYg4Jy.../reese/process-item/High to run JobTask(Ident(process-item),docspell.joex.scheduler.Task$$anonfun$contramap$4@42a8bf9f,docspell.joex.scheduler.Task$$anonfun$contramap$4@143e5e54) 
 DEBUG d.j.s.SchedulerImpl - Forking job 9LBtYg4Jy.../reese/process-item/High 
 DEBUG d.j.s.SchedulerImpl - Try to acquire permit (0 free) 
 DEBUG d.j.s.SchedulerImpl - Starting task now 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:37:04.976190Z Info 9LBtYg4Jy.../reese/process-item/High: ============ Start processing scan 21-03-12 15-50-54.pdf ============ 
 WARN  d.j.s.LogSink - >>> 2021-03-12T16:37:04.986070Z Warn 9LBtYg4Jy.../reese/process-item/High: Found 1 existing item with these files. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:05.092506Z Debug 9LBtYg4Jy.../reese/process-item/High: Found 1 attachments. Use only those from task args: Set(Ident(4PvKnuVuYnX-Xmd37zFbcmU-67ZKxcy7vSR-HediYtAZSMx)) 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:05.149246Z Debug 9LBtYg4Jy.../reese/process-item/High: Not an archive: application/pdf 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:37:05.205178Z Info 9LBtYg4Jy.../reese/process-item/High: Conversion to pdf already done for attachment Some(scan 21-03-12 15-50-54.converted.pdf). 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:37:05.229682Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text extraction for 1 files 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:37:05.230457Z Info 9LBtYg4Jy.../reese/process-item/High: TextExtraction skipped, since text is already available. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:05.230484Z Debug 9LBtYg4Jy.../reese/process-item/High: Storing extracted texts … 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:05.248529Z Debug 9LBtYg4Jy.../reese/process-item/High: Extracted text stored. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:05.248734Z Debug 9LBtYg4Jy.../reese/process-item/High: Updating SOLR index 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:37:06.317930Z Info 9LBtYg4Jy.../reese/process-item/High: Text extraction finished in 1088 ms. 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:37:06.318071Z Info 9LBtYg4Jy.../reese/process-item/High: Creating preview images for 1 files… 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:06.792179Z Debug 9LBtYg4Jy.../reese/process-item/High: Preview generated, saving to database… 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:37:06.845173Z Info 9LBtYg4Jy.../reese/process-item/High: Retrieving page count for 1 files… 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:06.859165Z Debug 9LBtYg4Jy.../reese/process-item/High: Found number of pages: 1 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:06.859255Z Debug 9LBtYg4Jy.../reese/process-item/High: Update attachment Gbivnz2MTzn-tUxHoioS6XK-UynhTxEKDsG-2sTYLjioYrx with page count Some(1) 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T16:37:06.860807Z Debug 9LBtYg4Jy.../reese/process-item/High: Stored page count (1). 
 DEBUG d.j.a.RegexNerFile - Cache time elapsed (Duration(819012ms) > Duration(60000ms)). Check for new state. 
 DEBUG d.j.a.RegexNerFile - There have been state changes for collective 'reese'. Reload NER file. 
 INFO  d.j.a.RegexNerFile - Generating custom NER file for collective 'reese' 
 DEBUG d.j.a.RegexNerFile - Writing custom NER file for collective 'reese' 
 INFO  d.j.s.LogSink - >>> 2021-03-12T16:37:06.890754Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text analysis 
 INFO  d.a.n.PipelineCache - Cancel stanford cache clearing, as it has been used in between. 
 ERROR d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retrying later. 
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
        at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
        at scala.util.Either.fold(Either.scala:190)
        at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
        at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
        at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
        at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
        at scala.collection.immutable.List.map(List.scala:246)
        at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
        at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
        at docspell.common.LenientUri$.parse(LenientUri.scala:212)
        at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
        at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
        at fs2.Chunk$Singleton.map(Chunk.scala:534)
        at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
        at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
        at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
        at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
        at fs2.internal.FreeC.viewL(Algebra.scala:106)
        at fs2.internal.FreeC$.go$1(Algebra.scala:421)
        at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
        at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
 DEBUG d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High done Stuck. Releasing resources. 
 DEBUG d.j.s.SchedulerImpl - Permit released (0 free) 
 DEBUG d.j.s.SchedulerImpl - New permit acquired 
 ERROR d.j.s.LogSink - >>> 2021-03-12T16:37:07.388676Z Error 9LBtYg4Jy.../reese/process-item/High: Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retrying later. 
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
        at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
        at scala.util.Either.fold(Either.scala:190)
        at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
        at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
        at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
        at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
        at scala.collection.immutable.List.map(List.scala:246)
        at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
        at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
        at docspell.common.LenientUri$.parse(LenientUri.scala:212)
        at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
        at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
        at fs2.Chunk$Singleton.map(Chunk.scala:534)
        at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
        at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
        at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
        at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
        at fs2.internal.FreeC.viewL(Algebra.scala:106)
        at fs2.internal.FreeC$.go$1(Algebra.scala:421)
        at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
        at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
 DEBUG d.j.s.SchedulerImpl - Next job found: None 
 DEBUG d.j.s.SchedulerImpl - Waiting for notify 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 INFO  d.a.n.PipelineCache - Clearing stanford nlp cache now! 
 WARN  e.s.n.p.StanfordCoreNLP - Clearing CoreNLP annotation pool; this should be unnecessary in production 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.SchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.SchedulerImpl - Notify signal, going into main loop 
 DEBUG d.j.s.SchedulerImpl - Try to acquire permit (1 free) 
 DEBUG d.j.s.SchedulerImpl - New permit acquired 
 DEBUG d.j.s.SchedulerImpl - Next job found: Some(9LBtYg4Jy.../reese/process-item/High) 
 DEBUG d.j.s.SchedulerImpl - Creating context for job 9LBtYg4Jy.../reese/process-item/High to run JobTask(Ident(process-item),docspell.joex.scheduler.Task$$anonfun$contramap$4@42a8bf9f,docspell.joex.scheduler.Task$$anonfun$contramap$4@143e5e54) 
 DEBUG d.j.s.SchedulerImpl - Forking job 9LBtYg4Jy.../reese/process-item/High 
 DEBUG d.j.s.SchedulerImpl - Try to acquire permit (0 free) 
 DEBUG d.j.s.SchedulerImpl - Starting task now 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.085595Z Info 9LBtYg4Jy.../reese/process-item/High: ============ Start processing scan 21-03-12 15-50-54.pdf ============ 
 WARN  d.j.s.LogSink - >>> 2021-03-12T17:07:05.091475Z Warn 9LBtYg4Jy.../reese/process-item/High: Found 1 existing item with these files. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.093368Z Debug 9LBtYg4Jy.../reese/process-item/High: Found 1 attachments. Use only those from task args: Set(Ident(4PvKnuVuYnX-Xmd37zFbcmU-67ZKxcy7vSR-HediYtAZSMx)) 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.273489Z Debug 9LBtYg4Jy.../reese/process-item/High: Not an archive: application/pdf 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.362444Z Info 9LBtYg4Jy.../reese/process-item/High: Conversion to pdf already done for attachment Some(scan 21-03-12 15-50-54.converted.pdf). 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.513455Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text extraction for 1 files 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.513650Z Info 9LBtYg4Jy.../reese/process-item/High: TextExtraction skipped, since text is already available. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.513685Z Debug 9LBtYg4Jy.../reese/process-item/High: Storing extracted texts … 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.529570Z Debug 9LBtYg4Jy.../reese/process-item/High: Extracted text stored. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.529670Z Debug 9LBtYg4Jy.../reese/process-item/High: Updating SOLR index 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.722953Z Info 9LBtYg4Jy.../reese/process-item/High: Text extraction finished in 209 ms. 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.723070Z Info 9LBtYg4Jy.../reese/process-item/High: Creating preview images for 1 files… 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.310647Z Debug 9LBtYg4Jy.../reese/process-item/High: Preview generated, saving to database… 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:06.363117Z Info 9LBtYg4Jy.../reese/process-item/High: Retrieving page count for 1 files… 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.376701Z Debug 9LBtYg4Jy.../reese/process-item/High: Found number of pages: 1 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.376723Z Debug 9LBtYg4Jy.../reese/process-item/High: Update attachment Gbivnz2MTzn-tUxHoioS6XK-UynhTxEKDsG-2sTYLjioYrx with page count Some(1) 
 DEBUG d.j.a.RegexNerFile - Cache time elapsed (Duration(1799509ms) > Duration(60000ms)). Check for new state. 
 DEBUG d.j.a.RegexNerFile - There have been state changes for collective 'reese'. Reload NER file. 
 INFO  d.j.a.RegexNerFile - Generating custom NER file for collective 'reese' 
 DEBUG d.j.a.RegexNerFile - Writing custom NER file for collective 'reese' 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.398411Z Debug 9LBtYg4Jy.../reese/process-item/High: Stored page count (1). 
 INFO  d.a.n.PipelineCache - Cancel stanford cache clearing, as it has been used in between. 
 INFO  d.a.n.StanfordNerAnnotator - Creating english Stanford NLP NER classifier... 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator tokenize 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator ssplit 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator pos 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:06.399522Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text analysis 
 INFO  e.s.n.t.m.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done . 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator lemma 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator ner 
 INFO  e.s.n.i.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done . 
 INFO  e.s.n.p.NERCombinerAnnotator - numeric classifiers: false; SUTime: false ; fine grained: false 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator regexner 
 INFO  e.s.n.p.TokensRegexNERAnnotator - regexner: Read 156 unique entries out of 156 from /tmp/docspell-analysis/regexner-10380637799222175923/reese.txt, 0 TokensRegex patterns. 
 WARN  d.j.s.LogSink - >>> 2021-03-12T17:07:09.127290Z Warn 9LBtYg4Jy.../reese/process-item/High: Processing failed on last retry. Creating item but without proposals. 
 ERROR d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retries exceeded. 
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
        at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
        at scala.util.Either.fold(Either.scala:190)
        at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
        at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
        at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
        at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
        at scala.collection.immutable.List.map(List.scala:246)
        at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
        at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
        at docspell.common.LenientUri$.parse(LenientUri.scala:212)
        at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
        at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
        at fs2.Chunk$Singleton.map(Chunk.scala:534)
        at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
        at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
        at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
        at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
        at fs2.internal.FreeC.viewL(Algebra.scala:106)
        at fs2.internal.FreeC$.go$1(Algebra.scala:421)
        at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
        at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
 DEBUG d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High done Failed. Releasing resources. 
 DEBUG d.j.s.SchedulerImpl - Permit released (0 free) 
 DEBUG d.j.s.SchedulerImpl - New permit acquired 
 ERROR d.j.s.LogSink - >>> 2021-03-12T17:07:09.133759Z Error 9LBtYg4Jy.../reese/process-item/High: Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retries exceeded. 
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
        at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
        at scala.util.Either.fold(Either.scala:190)
        at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
        at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
        at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
        at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
        at scala.collection.immutable.List.map(List.scala:246)
        at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
        at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
        at docspell.common.LenientUri$.parse(LenientUri.scala:212)
        at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
        at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
        at fs2.Chunk$Singleton.map(Chunk.scala:534)
        at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
        at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
        at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
        at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
        at fs2.internal.FreeC.viewL(Algebra.scala:106)
        at fs2.internal.FreeC$.go$1(Algebra.scala:421)
        at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
        at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
 DEBUG d.j.s.SchedulerImpl - Next job found: None 
 DEBUG d.j.s.SchedulerImpl - Waiting for notify 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 INFO  d.a.n.PipelineCache - Clearing stanford nlp cache now! 
 WARN  e.s.n.p.StanfordCoreNLP - Clearing CoreNLP annotation pool; this should be unnecessary in production 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.SchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.SchedulerImpl - Notify signal, going into main loop 
 DEBUG d.j.s.SchedulerImpl - Try to acquire permit (1 free) 
 DEBUG d.j.s.SchedulerImpl - New permit acquired 
 DEBUG d.j.s.SchedulerImpl - Next job found: None 
 DEBUG d.j.s.SchedulerImpl - Waiting for notify 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
root@DOCSPELL:~# ^C
root@DOCSPELL:~# docker logs 6748cb283598 -since 60m
unknown shorthand flag: 's' in -since
See 'docker logs --help'.
root@DOCSPELL:~# docker logs -since 60m 6748cb283598           
unknown shorthand flag: 's' in -since
See 'docker logs --help'.
root@DOCSPELL:~# docker logs 6748cb283598 --since 60m
 INFO  d.a.n.PipelineCache - Clearing stanford nlp cache now! 
 WARN  e.s.n.p.StanfordCoreNLP - Clearing CoreNLP annotation pool; this should be unnecessary in production 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.SchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.SchedulerImpl - Notify signal, going into main loop 
 DEBUG d.j.s.SchedulerImpl - Try to acquire permit (1 free) 
 DEBUG d.j.s.SchedulerImpl - New permit acquired 
 DEBUG d.j.s.SchedulerImpl - Next job found: Some(9LBtYg4Jy.../reese/process-item/High) 
 DEBUG d.j.s.SchedulerImpl - Creating context for job 9LBtYg4Jy.../reese/process-item/High to run JobTask(Ident(process-item),docspell.joex.scheduler.Task$$anonfun$contramap$4@42a8bf9f,docspell.joex.scheduler.Task$$anonfun$contramap$4@143e5e54) 
 DEBUG d.j.s.SchedulerImpl - Forking job 9LBtYg4Jy.../reese/process-item/High 
 DEBUG d.j.s.SchedulerImpl - Try to acquire permit (0 free) 
 DEBUG d.j.s.SchedulerImpl - Starting task now 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.085595Z Info 9LBtYg4Jy.../reese/process-item/High: ============ Start processing scan 21-03-12 15-50-54.pdf ============ 
 WARN  d.j.s.LogSink - >>> 2021-03-12T17:07:05.091475Z Warn 9LBtYg4Jy.../reese/process-item/High: Found 1 existing item with these files. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.093368Z Debug 9LBtYg4Jy.../reese/process-item/High: Found 1 attachments. Use only those from task args: Set(Ident(4PvKnuVuYnX-Xmd37zFbcmU-67ZKxcy7vSR-HediYtAZSMx)) 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.273489Z Debug 9LBtYg4Jy.../reese/process-item/High: Not an archive: application/pdf 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.362444Z Info 9LBtYg4Jy.../reese/process-item/High: Conversion to pdf already done for attachment Some(scan 21-03-12 15-50-54.converted.pdf). 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.513455Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text extraction for 1 files 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.513650Z Info 9LBtYg4Jy.../reese/process-item/High: TextExtraction skipped, since text is already available. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.513685Z Debug 9LBtYg4Jy.../reese/process-item/High: Storing extracted texts … 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.529570Z Debug 9LBtYg4Jy.../reese/process-item/High: Extracted text stored. 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:05.529670Z Debug 9LBtYg4Jy.../reese/process-item/High: Updating SOLR index 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.722953Z Info 9LBtYg4Jy.../reese/process-item/High: Text extraction finished in 209 ms. 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:05.723070Z Info 9LBtYg4Jy.../reese/process-item/High: Creating preview images for 1 files… 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.310647Z Debug 9LBtYg4Jy.../reese/process-item/High: Preview generated, saving to database… 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:06.363117Z Info 9LBtYg4Jy.../reese/process-item/High: Retrieving page count for 1 files… 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.376701Z Debug 9LBtYg4Jy.../reese/process-item/High: Found number of pages: 1 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.376723Z Debug 9LBtYg4Jy.../reese/process-item/High: Update attachment Gbivnz2MTzn-tUxHoioS6XK-UynhTxEKDsG-2sTYLjioYrx with page count Some(1) 
 DEBUG d.j.a.RegexNerFile - Cache time elapsed (Duration(1799509ms) > Duration(60000ms)). Check for new state. 
 DEBUG d.j.a.RegexNerFile - There have been state changes for collective 'reese'. Reload NER file. 
 INFO  d.j.a.RegexNerFile - Generating custom NER file for collective 'reese' 
 DEBUG d.j.a.RegexNerFile - Writing custom NER file for collective 'reese' 
 DEBUG d.j.s.LogSink - >>> 2021-03-12T17:07:06.398411Z Debug 9LBtYg4Jy.../reese/process-item/High: Stored page count (1). 
 INFO  d.a.n.PipelineCache - Cancel stanford cache clearing, as it has been used in between. 
 INFO  d.a.n.StanfordNerAnnotator - Creating english Stanford NLP NER classifier... 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator tokenize 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator ssplit 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator pos 
 INFO  d.j.s.LogSink - >>> 2021-03-12T17:07:06.399522Z Info 9LBtYg4Jy.../reese/process-item/High: Starting text analysis 
 INFO  e.s.n.t.m.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done . 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator lemma 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator ner 
 INFO  e.s.n.i.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done . 
 INFO  e.s.n.p.NERCombinerAnnotator - numeric classifiers: false; SUTime: false ; fine grained: false 
 INFO  e.s.n.p.StanfordCoreNLP - Adding annotator regexner 
 INFO  e.s.n.p.TokensRegexNERAnnotator - regexner: Read 156 unique entries out of 156 from /tmp/docspell-analysis/regexner-10380637799222175923/reese.txt, 0 TokensRegex patterns. 
 WARN  d.j.s.LogSink - >>> 2021-03-12T17:07:09.127290Z Warn 9LBtYg4Jy.../reese/process-item/High: Processing failed on last retry. Creating item but without proposals. 
 ERROR d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retries exceeded. 
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
        at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
        at scala.util.Either.fold(Either.scala:190)
        at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
        at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
        at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
        at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
        at scala.collection.immutable.List.map(List.scala:246)
        at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
        at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
        at docspell.common.LenientUri$.parse(LenientUri.scala:212)
        at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
        at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
        at fs2.Chunk$Singleton.map(Chunk.scala:534)
        at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
        at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
        at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
        at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
        at fs2.internal.FreeC.viewL(Algebra.scala:106)
        at fs2.internal.FreeC$.go$1(Algebra.scala:421)
        at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
        at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
 DEBUG d.j.s.SchedulerImpl - Job 9LBtYg4Jy.../reese/process-item/High done Failed. Releasing resources. 
 DEBUG d.j.s.SchedulerImpl - Permit released (0 free) 
 DEBUG d.j.s.SchedulerImpl - New permit acquired 
 ERROR d.j.s.LogSink - >>> 2021-03-12T17:07:09.133759Z Error 9LBtYg4Jy.../reese/process-item/High: Job 9LBtYg4Jy.../reese/process-item/High execution failed. Retries exceeded. 
java.lang.IllegalArgumentException: Invalid hexadecimal character 'x' at index 0
        at scodec.bits.ByteVector$.$anonfun$fromValidHex$1(ByteVector.scala:1816)
        at scala.util.Either.fold(Either.scala:190)
        at scodec.bits.ByteVector$.fromValidHex(ByteVector.scala:1816)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1(LenientUri.scala:237)
        at docspell.common.LenientUri$.$anonfun$percentDecode$1$adapted(LenientUri.scala:236)
        at scala.collection.StringOps$.foldLeft$extension(StringOps.scala:1058)
        at docspell.common.LenientUri$.percentDecode(LenientUri.scala:236)
        at docspell.common.LenientUri$.$anonfun$parse$1(LenientUri.scala:157)
        at scala.collection.immutable.List.map(List.scala:246)
        at docspell.common.LenientUri$.makePath$1(LenientUri.scala:157)
        at docspell.common.LenientUri$.splitPathQF$1(LenientUri.scala:173)
        at docspell.common.LenientUri$.parse(LenientUri.scala:212)
        at docspell.analysis.contact.Contact$.isWebsite(Contact.scala:37)
        at docspell.analysis.contact.Contact$.$anonfun$annotate$1(Contact.scala:18)
        at fs2.Chunk$Singleton.map(Chunk.scala:534)
        at fs2.internal.FreeC$Output.$anonfun$mapOutput$2(Algebra.scala:288)
        at fs2.internal.FreeC$$anon$7.cont(Algebra.scala:200)
        at fs2.internal.FreeC$ViewL$$anon$9$$anon$10.<init>(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$$anon$9.cont(Algebra.scala:240)
        at fs2.internal.FreeC$ViewL$.mk(Algebra.scala:231)
        at fs2.internal.FreeC$ViewL$.apply(Algebra.scala:220)
        at fs2.internal.FreeC.viewL(Algebra.scala:106)
        at fs2.internal.FreeC$.go$1(Algebra.scala:421)
        at fs2.internal.FreeC$.$anonfun$compile$10(Algebra.scala:477)
        at fs2.internal.FreeC$.$anonfun$compile$1(Algebra.scala:437)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.$anonfun$close$9(CompileScope.scala:246)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at map @ fs2.internal.CompileScope.fs2$internal$CompileScope$$traverseError(CompileScope.scala:222)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
        at flatMap @ fs2.Stream$Compiler$$anon$5.apply(Stream.scala:4228)
 DEBUG d.j.s.SchedulerImpl - Next job found: None 
 DEBUG d.j.s.SchedulerImpl - Waiting for notify 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 INFO  d.a.n.PipelineCache - Clearing stanford nlp cache now! 
 WARN  e.s.n.p.StanfordCoreNLP - Clearing CoreNLP annotation pool; this should be unnecessary in production 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.SchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.SchedulerImpl - Notify signal, going into main loop 
 DEBUG d.j.s.SchedulerImpl - Try to acquire permit (1 free) 
 DEBUG d.j.s.SchedulerImpl - New permit acquired 
 DEBUG d.j.s.SchedulerImpl - Next job found: None 
 DEBUG d.j.s.SchedulerImpl - Waiting for notify 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 
 DEBUG d.j.s.PeriodicSchedulerImpl - Periodic awake reached 

The file in question is attached too. It wasn't confidential in this case. scan 21-03-12 15-50-54.pdf

eikek commented 3 years ago

Thank you very much!

eikek commented 3 years ago

This was a quite embarrassing bug. Thank you for your patients and feedback! It is fixed now on master and will be in the next release.