UB-Mannheim / tesseract

Tesseract Open Source OCR Engine (main repository)
Apache License 2.0
3.16k stars 439 forks source link

Link fo file download not found. (404 error) (https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-5.3.1.20230401.exe) #70

Closed carlosfalcone closed 4 days ago

carlosfalcone commented 1 year ago

Current Behavior

No response

Expected Behavior

No response

Suggested Fix

No response

tesseract -v

No response

Operating System

No response

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

stweil commented 1 year ago

It works for me. When did you try the download? Do you still have a problem?

Hermann12 commented 1 year ago

Link works for me too.

carlosfalcone commented 1 year ago

Today the link is working. Thanks for your reply.

Em dom., 30 de abr. de 2023 às 02:01, Hermann12 @.***> escreveu:

Link works for me too.

— Reply to this email directly, view it on GitHub https://github.com/UB-Mannheim/tesseract/issues/70#issuecomment-1528939709, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYT7LNVHDPZT3JB3H62KYO3XDXWYXANCNFSM6AAAAAAXPKV62M . You are receiving this because you authored the thread.Message ID: @.***>

Anime37 commented 1 year ago

Had the same issue. Making my DNS automatic instead of Manual solved the problem. I was using CloudFare DNS

kaixxx commented 1 year ago

I had the same problem. Might have to do with the cookie policy. After I visited https://digi.bib.uni-mannheim.de/ and answered to the cookie question (I denied cookies), the download links worked fine.

stweil commented 1 year ago

The download link does not use any cookies. I think there is a DNS problem if downloads fail. Usually a retry (maybe later) should help. If you report the exact time (including time zone) of failing downloads I can also check the web server protocol for possible failures.

damies13 commented 1 year ago

I encountered this about 30 min ago

Line |
   2 |  Invoke-WebRequest -Uri "https://digi.bib.uni-mannheim.de/tesseract/te …
     |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | No such host is known.

Would it be better if there were mirrors that hosted these files? I would happily download from a mirror that's a little closer to Australia rather than pulling data from the other side of the planet (I'm grateful I actually can even download files from so far away and it works so well most of the time)

anphex commented 1 year ago

It's an on-off-behaviour guys. I tried some python experiments on multiple machines and it was hit or miss. If an university in one of the most developed countries in the world isn't capable of running a basic website, that's all you need to know about IT progress in Germany.

stweil commented 1 year ago

@anphex, a more detailed bug report would be helpful. The web server is up more than 99.9% of the time, only restarted when necessary due to a new Linux kernel. What exactly is failing? Are you getting timeouts? Is name resolution failing? From which part of the world are downloads failing?

anphex commented 1 year ago

@anphex, a more detailed bug report would be helpful. The web server is up more than 99.9% of the time, only restarted when necessary due to a new Linux kernel. What exactly is failing? Are you getting timeouts? Is name resolution failing? From which part of the world are downloads failing?

Good morning! I was really annoyed yesterday because installing the tesseract exe was one of the last parts of finishing a script and it was already late. Sorry for my mean comment. The only thing I can "confirm" through my chrome history is that there was no connection possible at 22:25 German time.

damies13 commented 1 year ago

@stweil

If it helps, I can give date/times when it failed to download in my build process vs when it download successfully

Times when file download failed:

Times when file download succeeded:

As you can see there is often only a few seconds between a "No such host is known." error or the file being downloaded.

I hope this is helpful in finding the issue,

Dave.

stweil commented 1 year ago

@damies13, that's a special case where the access was not possible most of the time because of a heavy denial of service attack which lasted more than 24 hours.