Informatievlaanderen / VSDS-Linked-Data-Interactions

https://informatievlaanderen.github.io/VSDS-Linked-Data-Interactions/
European Union Public License 1.2
4 stars 6 forks source link

SPARQL construct NiFi processor is broken when producing one-to-many messages using the graph splitting technique #664

Open rorlic opened 1 month ago

rorlic commented 1 month ago

When a SPARQL Contruct processor is configured to split a linked data model using graphs, errors are thrown preventing the pipeline to continue.

To Reproduce (see next comment for test setup)

  1. Download the RML adapter processor and the SPARQL Construct processor in the local NiFi extensions folder

  2. Start NiFi workbench:

    clear
    docker compose up -d --wait
  3. Log on to the NiFi workbench at https://localhost:8443/nifi using the credentials found in the .env file

  4. Import the pipeline (create process group & browse for the this pipeline)

  5. Start the pipeline

  6. Process the data:

    curl -X POST -H "Content-Type: text/csv" http://localhost:8080/pipeline --data-binary @./data.csv
  7. Verify that the SPARQL Construct processor issues errors:

    2024-07-11 09:21:36,928 WARN [Timer-Driven Process Thread-2] o.a.n.controller.tasks.ConnectableTask Processing halted: uncaught exception in Component [SparqlConstructProcessor[id=1e1bc70f-4f13-36fd-61c4-5dffa7f88225]]
    org.apache.nifi.processor.exception.FlowFileHandlingException: StandardFlowFileRecord[uuid=4d37a9ae-4269-434c-ba28-b76f9207db5f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1720689654679-1, container=default, section=1], offset=1627, length=5892],offset=0,name=4d37a9ae-4269-434c-ba28-b76f9207db5f,size=5892] is not known in this session (StandardProcessSession[id=43])
            at org.apache.nifi.controller.repository.StandardProcessSession.validateRecordState(StandardProcessSession.java:3714)
            at org.apache.nifi.controller.repository.StandardProcessSession.validateRecordState(StandardProcessSession.java:3700)
            at org.apache.nifi.controller.repository.StandardProcessSession.transfer(StandardProcessSession.java:2351)
            at be.vlaanderen.informatievlaanderen.ldes.ldi.processors.services.FlowManager.sendRDFToRelation(FlowManager.java:84)
            at be.vlaanderen.informatievlaanderen.ldes.ldi.processors.services.FlowManager.sendRDFToRelation(FlowManager.java:73)
            at be.vlaanderen.informatievlaanderen.ldes.ldi.processors.SparqlConstructProcessor.onTrigger(SparqlConstructProcessor.java:61)
            at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
            at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1274)
            at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:244)
            at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:102)
            at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
            at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
            at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358)
            at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
            at java.base/java.lang.Thread.run(Thread.java:1583)

Expected behavior Multiple flow files should be created and the pipeline should continue.

rorlic commented 1 month ago

Test setup attached: ldio.gh#664.zip