RMLio / RML-Mapper

Generate High Quality Linked Data from multiple originally (semi-)structured data (legacy)
http://RML.io
52 stars 20 forks source link

strange problems using the processor by java API #18

Open seralf opened 7 years ago

seralf commented 7 years ago

Hi

I'm currently trying to use the processor inside a data management workflow for a small POC: we are using CSVW as datasource (a CSV of ~5Mb, exposed on an "internal" http endpoint).

For my test I'm calling the processor from the shell and it works ok, instead when calling it from a Java wrapper (from Eclipse, with maven) sometimes it ends giving me no errors at all and producing less data! Can there be issues related to dependencies, memory configuration, bad multithread or anyt other kind of requirements that I should be aware of and I could have missed? What you suggest to check? Or on the other hand can you suggest me how to wrap the processor from Java? So I'll be able by myself to look for what I'm missing.

Thanks in advance.

seralf commented 7 years ago

For example I've used the following sequence (re-constructed following the sequence of calls on the Main.main on github):

package mypackage;

import ...

public class ExampleMainFromJava {

  public static void main(String[] args) {

    try {
      // TODO: delete repositories folder

      File mapping_file = Paths
          .get("some/path/my-mappingrml.ttl")
          .toFile();

      File outputFile = Paths.get("some/other/path/dump.ttl").toFile();

      RMLDocRetrieval mapDocRetrieval = new RMLDocRetrieval();
      final Repository repository = mapDocRetrieval.getMappingDoc(mapping_file.toString(), RDFFormat.TURTLE);

      // HERE I could work with the internal repository (for example for syncronize data on my endpoint)
      // check mapping
      if (repository == null) {
        System.err.println("Problem retrieving the RML Mapping Document");
        System.exit(1);
      }

      StdRMLMappingFactory mappingFactory = new StdRMLMappingFactory(); // skolemization?
      RMLMapping mapping = mappingFactory.extractRMLMapping(repository);

      String graphName = "";
      java.util.Map<String, String> parameters = null;
      String[] exeTriplesMap = null;

      String outputFormat = Rio.getWriterFormatForFileName(outputFile.toString()).getName().toLowerCase();

      RMLEngine engine = new StdRMLEngine(outputFile.toString());

      // exploded method:  engine.run(...);

      final RMLDataset runningDataset = engine.chooseSesameDataSet("dataset", outputFile.toString(), outputFormat);
      engine.runRMLMapping(runningDataset, mapping, graphName, parameters, exeTriplesMap);

     // close the repository
     runningDataset.closeRepository();

      Thread.sleep(2000);
      System.exit(0);

    } catch (Throwable e) {
      System.err.println("\n\n\n\nSOMETHING WENT WRONG!");
      e.printStackTrace(System.err);
    }
  }

}
canarvaeza commented 6 years ago

Hey, do you solve this?, do you have some code about how to use the processor on your own program?

seralf commented 6 years ago

Hi no I didn't resolve the issue, it's rather weird.

However in order to create a small POC re-using the code from Java, you could take a look at the code above

canarvaeza commented 6 years ago

I have a web app created with maven, so can you help me in how i can reuse the code?. Copy pasting the folders to my project structure (to use the classes) or what is the best approach?

seralf commented 6 years ago

we should ask to the authors :-)

anyway I managed to have the library embedded in my own code for a POC in which I should be able to handle directly the mapping, avoiding the console, using the code above. You should have the jar in the classpath to test it. In my case it gives me some problem with maven, as it is a shaded jar with dependencies in, so it could give conflicts with other dependencies you have in your pom: in this direction the best option is to recomplie the jar (or remove "brutally" from it the packages you already have..., but it's a rather "hackish" solution)

I hope I was helpful

2018-03-27 22:10 GMT+02:00 Cristian Narvaez notifications@github.com:

I have a web app created with maven, so can you help me in how i can reuse the code?. Copy pasting the folders to my project structure (to use the classes) or what is the best approach?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RMLio/RML-Mapper/issues/18#issuecomment-376658464, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFYfDopjfWocjuQjA5wy_gmwEoGQhdJks5tipyzgaJpZM4M3E-B .