AVATEAM-IT-SYSTEMHAUS / mkdocs-kroki-plugin

MkDocs plugin for Kroki-Diagrams
MIT License
42 stars 27 forks source link

Kroki.io blocks requests coming from urllib.request.urlretrieve with a 403 #10

Closed geertvanheusden closed 2 months ago

geertvanheusden commented 2 years ago

I noticed that Kroki.io blocks requests coming from the urllib.request.urlretrieve call with a 403 (forbidden) response. It might be that there are some request headers missing, but I am not sure which ones. A simple curl to that same url works just fine.

A simple example that doesn't work:

import urllib.request

urllib.request.urlretrieve("https://kroki.io/plantuml/svg/eNptU11v2kAQfL9fseUFIsW47VujCMUFGiElVQRJ2tezvYZrz3fW3Rrk_vrumhgalSfuY2Z2ZnzcRdKB2tqq8dw3XTDbHcHnj5--QFbrP95NC19fw8oVU_ABDEXQVWWs0YRxCpm1sBZKhDVGDHssp2q8eVr8TB5MgS5isirRkakMhht4XD3D5BvrlEja2HgNERF2RE28SdOtoV2by8BUH6LVeZTfxBTexaTyIWmsdmI1za3P01pHwpA-rObL75vllVIfjCtsWyLcCt3kafZjM_d17d1MqTG8OFau2QzQDqHy1vqDcVuwxiGQhyIgh4Jo6saK3xL2Bg9MvKS7OaFmFwbfo8OgbfrClcTZ__ePPjcW0-xpdc8jD7q7gNlg0QZD3bE_6jJXchgeql2B6dxvnSF_gSegljB90HVe6guAhSad64jponO69ouvnMBiRdLB8fOXJmBBxjul-gST6NtQIH-u0XLPbuKIV7WxliERfAWtoEZX6hxosvfE5fIBQ1_7NfBGiAIGvkZhvOWYyGHW0o7vZSTIWnIXWnwI7dehdxjbvDZ04h9TTrZ94YTP_je6QWMesO9O295wQGqDA9Y58zinDyX7Qwas-w3ITvBMZRF-rG1TysOQkdDwkXhliaE-iYqL_JgTYahXJDy_LJYJ3b889dYmJMkMhtzvDk_dqeEabuX8XUp1QvWccxJ1Xg5qbE_doSv5v_MX7y5b-g==", "test")
b-bittner commented 2 years ago

Hello @geertvanheusden, thanks for posting this issue. Can you be a bit more specific with your problem. At the moment I can't figure out how this is related to the plugin-code.

geertvanheusden commented 2 years ago

Hi @b-bittner,

The following line will always fail with a "403: Forbidden" when you go to the https://kroki.io service: https://github.com/AVATEAM-IT-SYSTEMHAUS/mkdocs-kroki-plugin/blob/0a613c5fc92b1e9c50d107df365fd599d40c1496/kroki/plugin.py#L119 It does work for the self-hosted version of Kroki so I assume there is some kind of API Gateway/Firewall in front of the real service that blocks calls with the urllib.request.urlretrieve method (see simple example above) because the user-agent header is set to Python-urllib/3.10.

You will see the same result when you execute a curl command with the above header:

curl -vvv  -H 'user-agent: Python-urllib/3.10'  https://kroki.io/plantuml/svg/eNptU11v2kAQfL9fseUFIsW47VujCMUFGiElVQRJ2tezvYZrz3fW3Rrk_vrumhgalSfuY2Z2ZnzcRdKB2tqq8dw3XTDbHcHnj5--QFbrP95NC19fw8oVU_ABDEXQVWWs0YRxCpm1sBZKhDVGDHssp2q8eVr8TB5MgS5isirRkakMhht4XD3D5BvrlEja2HgNERF2RE28SdOtoV2by8BUH6LVeZTfxBTexaTyIWmsdmI1za3P01pHwpA-rObL75vllVIfjCtsWyLcCt3kafZjM_d17d1MqTG8OFau2QzQDqHy1vqDcVuwxiGQhyIgh4Jo6saK3xL2Bg9MvKS7OaFmFwbfo8OgbfrClcTZ__ePPjcW0-xpdc8jD7q7gNlg0QZD3bE_6jJXchgeql2B6dxvnSF_gSegljB90HVe6guAhSad64jponO69ouvnMBiRdLB8fOXJmBBxjul-gST6NtQIH-u0XLPbuKIV7WxliERfAWtoEZX6hxosvfE5fIBQ1_7NfBGiAIGvkZhvOWYyGHW0o7vZSTIWnIXWnwI7dehdxjbvDZ04h9TTrZ94YTP_je6QWMesO9O295wQGqDA9Y58zinDyX7Qwas-w3ITvBMZRF-rG1TysOQkdDwkXhliaE-iYqL_JgTYahXJDy_LJYJ3b889dYmJMkMhtzvDk_dqeEabuX8XUp1QvWccxJ1Xg5qbE_doSv5v_MX7y5b-g==
b-bittner commented 2 years ago

Hi @geertvanheusden , I can reproduce your problem with curl and you simple sample snippet in your first post. I also decoded you URL and recreated the UML-Code.

PlantUML (Click to expand) ``` @startuml 'Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. 'SPDX-License-Identifier: MIT (For details, see https://github.com/awslabs/aws-icons-for-plantuml/blob/master/LICENSE) !include ' Uncomment the following line to create simplified view ' !include !include !include !include !include !include left to right direction Users(sources, "Events", "millions of users") APIGateway(votingAPI, "Voting API", "user votes") Cognito(userAuth, "User Authentication", "jwt to submit votes") Lambda(generateToken, "User Credentials", "return jwt") Lambda(recordVote, "Record Vote", "enter or update vote per user") DynamoDB(voteDb, "Vote Database", "one entry per user") sources --> userAuth sources --> votingAPI userAuth <--> generateToken votingAPI --> recordVote recordVote --> voteDb @enduml ```

But when I'm running it in Kroki, it is always successful. Sorry, at the moment I can't reproduce your issue. Have you updated all packages in you virt-env or installation? Do you have more details on your environment?

geertvanheusden commented 2 years ago

Hi @b-bittner , could you please define what you mean by: "But when I'm running it in Kroki, it is always successful."

Well the fact you are able to reproduce it with the sample snippet, which is almost literally a copy from the code as referred to in my previous post, indicates that something, at the Kroki server-side, is blocking the request with a 403.

Based on my research, that is because of the user-agent header set by the Python code: urllib.request.urlretrieve(url, target / filename).

Which is identical to:

curl -vvv  -H 'user-agent: Python-urllib/3.10'  https://kroki.io/plantuml/svg/eNptU11v2kAQfL9fseUFIsW47VujCMUFGiElVQRJ2tezvYZrz3fW3Rrk_vrumhgalSfuY2Z2ZnzcRdKB2tqq8dw3XTDbHcHnj5--QFbrP95NC19fw8oVU_ABDEXQVWWs0YRxCpm1sBZKhDVGDHssp2q8eVr8TB5MgS5isirRkakMhht4XD3D5BvrlEja2HgNERF2RE28SdOtoV2by8BUH6LVeZTfxBTexaTyIWmsdmI1za3P01pHwpA-rObL75vllVIfjCtsWyLcCt3kafZjM_d17d1MqTG8OFau2QzQDqHy1vqDcVuwxiGQhyIgh4Jo6saK3xL2Bg9MvKS7OaFmFwbfo8OgbfrClcTZ__ePPjcW0-xpdc8jD7q7gNlg0QZD3bE_6jJXchgeql2B6dxvnSF_gSegljB90HVe6guAhSad64jponO69ouvnMBiRdLB8fOXJmBBxjul-gST6NtQIH-u0XLPbuKIV7WxliERfAWtoEZX6hxosvfE5fIBQ1_7NfBGiAIGvkZhvOWYyGHW0o7vZSTIWnIXWnwI7dehdxjbvDZ04h9TTrZ94YTP_je6QWMesO9O295wQGqDA9Y58zinDyX7Qwas-w3ITvBMZRF-rG1TysOQkdDwkXhliaE-iYqL_JgTYahXJDy_LJYJ3b889dYmJMkMhtzvDk_dqeEabuX8XUp1QvWccxJ1Xg5qbE_doSv5v_MX7y5b-g\=\=

If you change the header to for example Chrome, it does work:

curl -vvv  -H 'user-agent: Chrome'  https://kroki.io/plantuml/svg/eNptU11v2kAQfL9fseUFIsW47VujCMUFGiElVQRJ2tezvYZrz3fW3Rrk_vrumhgalSfuY2Z2ZnzcRdKB2tqq8dw3XTDbHcHnj5--QFbrP95NC19fw8oVU_ABDEXQVWWs0YRxCpm1sBZKhDVGDHssp2q8eVr8TB5MgS5isirRkakMhht4XD3D5BvrlEja2HgNERF2RE28SdOtoV2by8BUH6LVeZTfxBTexaTyIWmsdmI1za3P01pHwpA-rObL75vllVIfjCtsWyLcCt3kafZjM_d17d1MqTG8OFau2QzQDqHy1vqDcVuwxiGQhyIgh4Jo6saK3xL2Bg9MvKS7OaFmFwbfo8OgbfrClcTZ__ePPjcW0-xpdc8jD7q7gNlg0QZD3bE_6jJXchgeql2B6dxvnSF_gSegljB90HVe6guAhSad64jponO69ouvnMBiRdLB8fOXJmBBxjul-gST6NtQIH-u0XLPbuKIV7WxliERfAWtoEZX6hxosvfE5fIBQ1_7NfBGiAIGvkZhvOWYyGHW0o7vZSTIWnIXWnwI7dehdxjbvDZ04h9TTrZ94YTP_je6QWMesO9O295wQGqDA9Y58zinDyX7Qwas-w3ITvBMZRF-rG1TysOQkdDwkXhliaE-iYqL_JgTYahXJDy_LJYJ3b889dYmJMkMhtzvDk_dqeEabuX8XUp1QvWccxJ1Xg5qbE_doSv5v_MX7y5b-g\=\=

So I am not saying it is a bug in this library but rather something on the Kroki server side which doesn't apply when you host it yourself. A workaround could be provided though by passing a different user-agent header. Up to you to decide ;-)

Hoss3770 commented 1 year ago

I can confirm the same problem

b-bittner commented 1 year ago

I've merged #27 into main, but not yet as the newest release. Maybe someone can confirm ist fixing the issue and and not creating new ones.

oniboni commented 2 months ago

This should not be an issue any more, but:

You can manually set the user agent via:

  - kroki:
      UserAgent: 'Chrome'

Sorry, I forgot to document this option, ref #54