ResearchSoftwareInstitute / greendatatranslator

Green Team Data Translator Software Engineering and Development
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Explore and customize LD-VOWL tool source code to better fit our needs #57

Closed hyi closed 7 years ago

hyi commented 7 years ago
hyi commented 7 years ago

Have downloaded LD-VOWL source code and built and installed the tool on my computer. However, the locally installed LD-VOWL running on webpack development server appears to have issues to access external SPARQL endpoints which needs to be resolved next. Have also read the two published papers on LD-VOWL and understood how the tool works. I think the main reason for an unsatisfactory schema extraction from our blazegraph endpoints using LD-VOWL is that the default upper limit for the number of classes with the most instances from a generic SPARQL query in LD-VOWL is 10 so only 10 classes are returned and shown as circular nodes in the visualization. This is because the SPARQL endpoints can have strict limits in terms of execution time, so the SPARQL query must not be too complex according to their papers. Additionally, interactive visualization would not be interactive if it took too long for a specific SPARQL query to return results. Will explore along these lines and see whether increasing this default number 10 to an acceptable number would be useful for us. Also, it may be useful to extract the code for schema extraction based on SPARQL queries into a nightly run job so that it can take as much time as needed to assemble the needed semantic relationships for the interactive visualization to use.

stevencox commented 7 years ago

That's interesting, @hyi. The query timeout in Blazegraph is configurable. While it would probably not be good to set it higher over the long term, perhaps we could run it with a high timeout for a period of time to let you collect the data you need (perhaps saved as JSON objects?), then set it back to normal. Just let me know when you have an idea of what would be most helpful.

hyi commented 7 years ago

@stevencox Agreed, will let you know after I explored it a bit more.

hyi commented 7 years ago

Worked with @stevencox and Chris Rutledge trying to get the blazegraph SPARQL endpoint accessible from my local LD-VOWL development server as the local LD-VOWL dev server can access dbpedia SPARQL endpoint. We found the culprit is CORS header cross-origin configuration 'Access-Control-Allow-Origin which does not allow access from my localhost development server. Chris has been working on trying to configure the nginx server to allow access from my localhost, but we have not been successful yet.

I did increase the default 10 for number of classes with the most instances from a generic SPARQL query in LD-VOWL to 20 in their public server (this number is configuration from settings of the tool) pointing to our blazegraph SPARQL endpoint. It took much longer to retrieve results, but did get a little more detailed class relationship graph as expected. Steve mentioned we need to consider JSON-LD and see if we can make the tool to visualize JSON-LD/smartAPI endpoints. JSON-LD appears to be simple enough to parse to me. There is a json-ld.js that can be leveraged to pass JSON-LD data and feed the data into D3 for visualization. Will work with @stevencox to see if there are JSON-LD SmartAPI endpoint for me to use, test, and experiment. I can also customize SPARQL endpoints in LD-VOWL as needed. Having data collection based on SPARQL as an offline job to have data provided for visualization front-end could also be a direction to pursue.

Looks like this LD-VOWL tool cannot be customized in time to be useful for us for hackathon at this point, but I will continue to explore the code base so that it can be customized as needed to be useful for our purposes in the future.

hyi commented 7 years ago

Closing this for now as I am beginning in the process to take on the exposures API work and don't think I have time to work on this exploratory work in the near future.