MajidBenam / seshat-3store

Seshat: Global History Databank was founded in 2011 to bring together the most current and comprehensive body of knowledge about human history in one place. The huge potential of this knowledge for testing theories about political and economic development has been largely untapped.
0 stars 1 forks source link

Scoped Values and Data integrity #1

Open GavinMendelGleason opened 2 years ago

GavinMendelGleason commented 2 years ago

I had a go last night at importing all of equinox and was able to get through it in a few minutes using a slightly different (inferred) schema.

I was using a scoped value definition which had an optional date range and required epistemic state.

The epistemic state however can only carry meaning if it is either inferred or known (and I think that known is not described in this ontology but implied).

I think there are some ontological questions here that I'm unable to answer. My guess is that disputations should be implied by multiple values which cover the same period - in which case it's a fact of data collection rather than some additional data added. If it is disputed do we always have the disputed values?

# NOTE: how come SV is not part of seshat_schema.__dict__['object']?? Because it didn't inherit from DocumentTemplate
# but to use we need to use ScopedValue before DocumentTemplate in the types below
class ScopedValue(DocumentTemplate):
    _schema = seshat_schema
    _subdocument = []
    _key = RandomKey() # All property instances that use SV need to be unique over the DB; this key is inherited properly
    #_key = LexicalKey("label") # not good enough
    # Everything in this class should be Optional

    # Eventually we will declare dates as follows and no longer need _dateRange
    # dates: Set[int ] # should be Optional like GYearRange below but we have to be explicit about low level types
    # BUG: You cannot have Optional[List[int ]]
    # terminusdb_client.errors.DatabaseError: Type error for json{'@class':"xsd:integer",'@type':"List"} which should be text
    # Eliminating Optional then requires always giving dates = [], not None
    # NOTE dates can be Set since it accepts empty, a singleton, or a distinct and ordered pair
    dates: Optional['_dateRange']
    # Confidence qualifiers
    # Semantics: if unknown is True then there will/must be no value set on the BoxedType tagged union
    # If unknown is True, suspected could be set True if it was supplied by an RA
    # and it would be removed (not set to False) once the expert approves
    # otherwise for a 'normal' fact it would have
    # 1) no unknown or suspected property,
    # 2) one of the BoxedType properties
    # 3) optionally one or both of disputed or inferred set
    unknown: Optional[bool]  # is the (typed) value explicitly unknown?
    # is the value (typically unknown) provided by an RA versus an expert
    suspected: Optional[bool]
    disputed: Optional[bool]  # is this one of several disputed values {}?
    inferred: Optional[bool]  # is the value inferred somehow?
jbennettgit commented 2 years ago

Yes, a fact that isn't explicitly tagged unknown (or suspected unknown in the case of RA-supplied facts) is implicitly 'known'. The idea with the ScopedValue declarations was that these annotations, including the date range, are 'optional' but that there may be additional constraints if any of the declarations were used. So, as you mention, if a set of facts are disputed then we have several (at least 2) fact triples and they will share the same dates, again, if supplied (since dates are optional).

When you have a chance it would be great to see the complete code you used to declare the Equinox schema and insert it into the DB using TerminusX!