blazegraph / database

Blazegraph High Performance Graph Database
GNU General Public License v2.0
871 stars 170 forks source link

Incorrectly pulled data from wikidata #174

Closed cfellicious closed 3 years ago

cfellicious commented 3 years ago

We maintain our own wikibase and while running our blazegraph updation process for the first time, data was pulled from wikidata instead of our wikibase. Now, we have data from both wikidata and wikibase which gives incorrect results when we query.

Is there any way where we can clear the blazegraph and then restart afresh?

thompsonbry commented 3 years ago

Chris, from what I remember of the wikidata integration, if you have a backup that is recent you could simply restore that backup and then the wikidata driver would roll it forward to the present time.

If you are asking about PITR for blazegraph, you can discard the last commit point with the default RWStore configuration. But you can not rollback to arbitrary points in time. While the RWStore supports this in principle, the overhead of maintaining historical data for some time period is high enough that this is not enabled by default. See https://github.com/blazegraph/database/wiki/RWStore#minreleaseage

Bryan

On Mon, Jul 27, 2020 at 5:28 AM Christofer Fellicious < notifications@github.com> wrote:

We maintain our own wikibase and while running our blazegraph updation process for the first time, data was pulled from wikidata instead of our wikibase. Now, we have data from both wikidata and wikibase which gives incorrect results when we query.

Is there any way where we can clear the blazegraph and then restart afresh?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/174, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YGNDHF45G5WEK3HZSDR5VXHVANCNFSM4PIW3WVQ .

cfellicious commented 3 years ago

Chris, from what I remember of the wikidata integration, if you have a backup that is recent you could simply restore that backup and then the wikidata driver would roll it forward to the present time. If you are asking about PITR for blazegraph, you can discard the last commit point with the default RWStore configuration. But you can not rollback to arbitrary points in time. While the RWStore supports this in principle, the overhead of maintaining historical data for some time period is high enough that this is not enabled by default. See https://github.com/blazegraph/database/wiki/RWStore#minreleaseage Bryan On Mon, Jul 27, 2020 at 5:28 AM Christofer Fellicious < @.***> wrote: We maintain our own wikibase and while running our blazegraph updation process for the first time, data was pulled from wikidata instead of our wikibase. Now, we have data from both wikidata and wikibase which gives incorrect results when we query. Is there any way where we can clear the blazegraph and then restart afresh? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#174>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YGNDHF45G5WEK3HZSDR5VXHVANCNFSM4PIW3WVQ .

Thank you for the reply. For us, we want to clear the current blazegraph and start afresh(sort of like a blank slate).

thompsonbry commented 3 years ago

You can just delete the backing Journal file. It contains all of the durable state. To locate this, check your configuration properties for Blazegraph.

Bryan

On Mon, Jul 27, 2020 at 6:14 AM Christofer Fellicious < notifications@github.com> wrote:

Chris, from what I remember of the wikidata integration, if you have a backup that is recent you could simply restore that backup and then the wikidata driver would roll it forward to the present time. If you are asking about PITR for blazegraph, you can discard the last commit point with the default RWStore configuration. But you can not rollback to arbitrary points in time. While the RWStore supports this in principle, the overhead of maintaining historical data for some time period is high enough that this is not enabled by default. See https://github.com/blazegraph/database/wiki/RWStore#minreleaseage Bryan … <#m-1054358733201495500> On Mon, Jul 27, 2020 at 5:28 AM Christofer Fellicious < @.***> wrote: We maintain our own wikibase and while running our blazegraph updation process for the first time, data was pulled from wikidata instead of our wikibase. Now, we have data from both wikidata and wikibase which gives incorrect results when we query. Is there any way where we can clear the blazegraph and then restart afresh? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#174 https://github.com/blazegraph/database/issues/174>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YGNDHF45G5WEK3HZSDR5VXHVANCNFSM4PIW3WVQ .

Thank you for the reply. For us, we want to clear the current blazegraph and start afresh(sort of like a blank slate).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/174#issuecomment-664388850, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YECZWHI6GFU3PHSYYDR5V4UTANCNFSM4PIW3WVQ .

cfellicious commented 3 years ago

You can just delete the backing Journal file. It contains all of the durable state. To locate this, check your configuration properties for Blazegraph. Bryan On Mon, Jul 27, 2020 at 6:14 AM Christofer Fellicious < notifications@github.com> wrote: Chris, from what I remember of the wikidata integration, if you have a backup that is recent you could simply restore that backup and then the wikidata driver would roll it forward to the present time. If you are asking about PITR for blazegraph, you can discard the last commit point with the default RWStore configuration. But you can not rollback to arbitrary points in time. While the RWStore supports this in principle, the overhead of maintaining historical data for some time period is high enough that this is not enabled by default. See https://github.com/blazegraph/database/wiki/RWStore#minreleaseage Bryan … <#m-1054358733201495500> On Mon, Jul 27, 2020 at 5:28 AM Christofer Fellicious < @.***> wrote: We maintain our own wikibase and while running our blazegraph updation process for the first time, data was pulled from wikidata instead of our wikibase. Now, we have data from both wikidata and wikibase which gives incorrect results when we query. Is there any way where we can clear the blazegraph and then restart afresh? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#174 <#174>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YGNDHF45G5WEK3HZSDR5VXHVANCNFSM4PIW3WVQ . Thank you for the reply. For us, we want to clear the current blazegraph and start afresh(sort of like a blank slate). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#174 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YECZWHI6GFU3PHSYYDR5V4UTANCNFSM4PIW3WVQ .

Thank you so much.

thompsonbry commented 3 years ago

As a caution: it will take it a while to rebuild the data from the wikidata stream.

Bryan

On Mon, Jul 27, 2020 at 6:16 AM Bryan B. Thompson thompsonbry@gmail.com wrote:

You can just delete the backing Journal file. It contains all of the durable state. To locate this, check your configuration properties for Blazegraph.

Bryan

On Mon, Jul 27, 2020 at 6:14 AM Christofer Fellicious < notifications@github.com> wrote:

Chris, from what I remember of the wikidata integration, if you have a backup that is recent you could simply restore that backup and then the wikidata driver would roll it forward to the present time. If you are asking about PITR for blazegraph, you can discard the last commit point with the default RWStore configuration. But you can not rollback to arbitrary points in time. While the RWStore supports this in principle, the overhead of maintaining historical data for some time period is high enough that this is not enabled by default. See https://github.com/blazegraph/database/wiki/RWStore#minreleaseage Bryan … <#m_-3520706295806969867m-1054358733201495500_> On Mon, Jul 27, 2020 at 5:28 AM Christofer Fellicious < @.***> wrote: We maintain our own wikibase and while running our blazegraph updation process for the first time, data was pulled from wikidata instead of our wikibase. Now, we have data from both wikidata and wikibase which gives incorrect results when we query. Is there any way where we can clear the blazegraph and then restart afresh? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#174 https://github.com/blazegraph/database/issues/174>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YGNDHF45G5WEK3HZSDR5VXHVANCNFSM4PIW3WVQ .

Thank you for the reply. For us, we want to clear the current blazegraph and start afresh(sort of like a blank slate).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/174#issuecomment-664388850, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YECZWHI6GFU3PHSYYDR5V4UTANCNFSM4PIW3WVQ .

cfellicious commented 3 years ago

As a caution: it will take it a while to rebuild the data from the wikidata stream. Bryan On Mon, Jul 27, 2020 at 6:16 AM Bryan B. Thompson thompsonbry@gmail.com wrote: You can just delete the backing Journal file. It contains all of the durable state. To locate this, check your configuration properties for Blazegraph. Bryan On Mon, Jul 27, 2020 at 6:14 AM Christofer Fellicious < @.> wrote: > Chris, from what I remember of the wikidata integration, if you have a > backup that is recent you could simply restore that backup and then the > wikidata driver would roll it forward to the present time. If you are > asking about PITR for blazegraph, you can discard the last commit point > with the default RWStore configuration. But you can not rollback to > arbitrary points in time. While the RWStore supports this in principle, the > overhead of maintaining historical data for some time period is high enough > that this is not enabled by default. See > https://github.com/blazegraph/database/wiki/RWStore#minreleaseage Bryan > … <#m_-3520706295806969867m-1054358733201495500_> > On Mon, Jul 27, 2020 at 5:28 AM Christofer Fellicious < @.> wrote: > We maintain our own wikibase and while running our blazegraph updation > process for the first time, data was pulled from wikidata instead of our > wikibase. Now, we have data from both wikidata and wikibase which gives > incorrect results when we query. Is there any way where we can clear the > blazegraph and then restart afresh? — You are receiving this because you > are subscribed to this thread. Reply to this email directly, view it on > GitHub <#174 <#174>>, or > unsubscribe > https://github.com/notifications/unsubscribe-auth/AATW7YGNDHF45G5WEK3HZSDR5VXHVANCNFSM4PIW3WVQ > . > > Thank you for the reply. For us, we want to clear the current blazegraph > and start afresh(sort of like a blank slate). > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#174 (comment)>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/AATW7YECZWHI6GFU3PHSYYDR5V4UTANCNFSM4PIW3WVQ > . >

That is okay.