Informatievlaanderen / VSDS-Linked-Data-Interactions

https://informatievlaanderen.github.io/VSDS-Linked-Data-Interactions/
European Union Public License 1.2
4 stars 6 forks source link

ERROR 1 --- [pool-5-thread-1] b.v.i.ldes.ldio.HttpInputPoller with the example https://informatievlaanderen.github.io/VSDS-Linked-Data-Interactions/ldio/examples/ex2-scrape-api. Special character ^ #432

Closed xdxxxdx closed 4 months ago

xdxxxdx commented 8 months ago

Scenario

  1. Follow the step by step guide https://informatievlaanderen.github.io/VSDS-Linked-Data-Interactions/ldio/examples/ex2-scrape-api.

Current result: ldes/ldi-orchestrator:1.11.0-SNAPSHOT LDIO runs in to error:

2023-12-14 15:09:19    _       ___     ___              ___                    _                        _                       _
2023-12-14 15:09:19   | |     |   \   |_ _|     o O O  / _ \     _ _    __    | |_      ___     ___    | |_      _ _   __ _    | |_     ___      _ _
2023-12-14 15:09:19   | |__   | |) |   | |     o      | (_) |   | '_|  / _|   | ' \    / -_)   (_-<    |  _|    | '_| / _` |   |  _|   / _ \    | '_|
2023-12-14 15:09:19   |____|  |___/   |___|   TS__[O]  \___/   _|_|_   \__|_  |_||_|   \___|   /__/_   _\__|   _|_|_  \__,_|   _\__|   \___/   _|_|_
2023-12-14 15:09:19 _|"""""|_|"""""|_|"""""| {======|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|
2023-12-14 15:09:19 "`-0-0-'"`-0-0-'"`-0-0-'./o--000'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'
2023-12-14 15:09:19 
2023-12-14 15:09:19 Version 
2023-12-14 15:09:19 Powered by Spring Boot 3.1.3
2023-12-14 15:09:19 
2023-12-14 15:09:19 2023-12-14T14:09:19.980Z  INFO 1 --- [           main] b.v.i.ldes.ldio.Application              : Starting Application using Java 18-ea with PID 1 (/ldio/ldio-application.jar started by ldio in /ldio)
2023-12-14 15:09:19 2023-12-14T14:09:19.989Z  INFO 1 --- [           main] b.v.i.ldes.ldio.Application              : No active profile set, falling back to 1 default profile: "default"
2023-12-14 15:09:27 2023-12-14T14:09:27.484Z  WARN 1 --- [           main] org.apache.sis.system                    : The “SIS_DATA” environment variable is not set.
2023-12-14 15:09:29 2023-12-14T14:09:29.697Z  INFO 1 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 1 endpoint(s) beneath base path '/actuator'
2023-12-14 15:09:31 2023-12-14T14:09:31.151Z  INFO 1 --- [           main] o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port 8080
2023-12-14 15:09:31 2023-12-14T14:09:31.211Z  INFO 1 --- [           main] b.v.i.ldes.ldio.Application              : Started Application in 12.392 seconds (process running for 14.278)
2023-12-14 15:09:31 2023-12-14T14:09:31.731Z ERROR 1 --- [pool-5-thread-1] b.v.i.ldes.ldio.HttpInputPoller          : ERROR - when='data', problem='Could not generate a valid iri from term lexical form [https://originassets.akamaized.net/origin-com-store-final-assets-prod/196787/142.0x200.0/1100749_MB_142x200_en_WW_^_2022-10-31-12-47-37_9cac194bb30aacd42504c01b4cf0c3db9be4f724.jpg] as-is, or prefixed with base iri [http://example.com/base/]'
2023-12-14 15:09:31 2023-12-14T14:09:31.738Z ERROR 1 --- [pool-5-thread-1] b.v.i.ldes.ldio.HttpInputPoller          : Could not generate a valid iri from term lexical form [https://originassets.akamaized.net/origin-com-store-final-assets-prod/196787/142.0x200.0/1100749_MB_142x200_en_WW_^_2022-10-31-12-47-37_9cac194bb30aacd42504c01b4cf0c3db9be4f724.jpg] as-is, or prefixed with base iri [http://example.com/base/]

Nothing printed out to the console.

If modify: url: https://www.cheapshark.com/api/1.0/deals?pageSize=1000 to url: https://www.cheapshark.com/api/1.0/deals?pageSize=5. No issue occurs.

Expected result: HttpInputPoller should be able to process special characters like ^..

Tomvbe commented 4 months ago

This is no issue with the HttpInputPoller nor with the RmlAdapter. The consumed data is just invalid, https://originassets.akamaized.net/origin-com-store-final-assets-prod/196787/142.0x200.0/1100749_MB_142x200_en_WW_^_2022-10-31-12-47-37_9cac194bb30aacd42504c01b4cf0c3db9be4f724.jpg is not a valid url as the ^ should have been encoded. A possible fix for the documentation example could be to adjust the rml mapping and define thumb as rr:Literal instead of rr:IRI