blazegraph / database

Blazegraph High Performance Graph Database
GNU General Public License v2.0
872 stars 170 forks source link

After successful load, created named graph only visible with delay #193

Open drth80 opened 3 years ago

drth80 commented 3 years ago

Hi,

we're using Blazegraph as a backend. Our frontend component (among others) triggers loads via LOAD <...> INTO GRAPH <...> statements.

After resolving initial timeout issues with the load balancer, we now manage to get success messages also for "large-ish" load, e.g. for ~420 MB ttl file, LOAD call returns after 2-4 minutes.

However, we sometimes have issues with the resulting named graph not showing up. Examples:

Is this known behaviour? Is there an explanation, and possibly a workaround? Loaded data not showing up tends to trigger people to re-load, re-load...

Thomas

thompsonbry commented 3 years ago

Data should be immediately available at the commit. Depending on the operation and the backend storage mode the commit can have significant latency as dirty pages are flushed.

The data is not visible until the commit. I suspect that you might be relying on the monitor mode to tell you when the load finished but not waiting for the commit. The SPARQL update is not complete until the http request returns the commit indicator.

Bryan

On Thu, Feb 18, 2021 at 05:53 Thomas notifications@github.com wrote:

Hi,

we're using Blazegraph as a backend. Our frontend component (among others) triggers loads via LOAD <...> INTO GRAPH <...> statements.

After resolving initial timeout issues with the load balancer, we now manage to get success messages also for "large-ish" load, e.g. for ~420 MB ttl file, LOAD call returns after 2-4 minutes.

However, we sometimes have issues with the resulting named graph not showing up. Examples:

  • loaded approx. 320 MB ttl in 4 min - graph visible instantaneously
  • loaded approx. 460MB ttl in 2.5 min - named graph not returned when querying for all named graphs. when I re-checked about 2 hours later, the graph was suddenly there.

Is this known behaviour? Is there an explanation, and possibly a workaround? Loaded data not showing up tends to trigger people to re-load, re-load...

Thomas

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/193, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YDOQBISWCEYBSP2PLTS7ULUDANCNFSM4X2K2ORQ .

drth80 commented 3 years ago

Hi Bryan, thanks for your response. Do you have any pointer where I can read about the monitor mode? would then use this information to check back with the tool vendor.

thompsonbry commented 3 years ago

The wiki for blazegraph is here

https://github.com/blazegraph/database/wiki

I do not see the option I am thinking of described under the rest api.

https://github.com/blazegraph/database/wiki/REST_API

It appears on the workbench as a [ ] Monitor option right next to the [ ] Explain option. It is just a URL request parameter which gets passed along and instructs the engine to provide incremental feedback as data are processed during a load, etc. However, the commit does not come until the end of the SPARQL UPDATE request. The application must wait for that.

That would be my guess. That monitor is on and the requester is assuming that when the LOAD is done that the data is durable. But they need to wait for the commit.

Bryan

On Thu, Feb 18, 2021 at 06:12 Thomas notifications@github.com wrote:

Hi Bryan, thanks for your response. Do you have any pointer where I can read about the monitor mode? would then use this information to check back with the tool vendor.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/193#issuecomment-781371049, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YFXC2YWCCE6VTUQSDDS7UN5PANCNFSM4X2K2ORQ .

drth80 commented 3 years ago

Thanks a lot, Bryan!