Store generated schema as raw datomic schema in EDN

mgrbyte commented 7 years ago

This will remove a level of abstraction that isn't really necessary. The idea is to consolidate the notion of schema in pseudoace, in both code and data terms.

I propose the following changes:
- Drop the dependency on the datomic-schema package. This is the lib that produces the format of the files currently stored in /generated-schemas It provides a level of abstraction that isn't necessary, and seems to hide/distract from the :pace meta-schema. The package also has some issues, which whilst not insurmountable, would need to keep up with changes in datomic itself - i think it's better to just to go with the raw schema.
In the existing namespaces: pseudoace.schemata
- Make all the schemas def'd here into EDN files
- Keep the install function (as the logic is required), but change to read EDN files. pseudoace.schema-datomic (remove)
- Move all functions here to psuedoace.schemata, rework to not use datomic-schema
Re-format all existing schema in /generated-schemas
Move /generated-schemas to resourcecs/schema, and just store the one version (current) of the schema. historical schema versions with git diff. (tentative this last one)

Once all this is done, then there will be no need for resolve-xref-info function that I've previously provided, as the information will then be able to be directly inferred from the schema.

mgrbyte commented 7 years ago

Agreement that this is a good way to go will mean I can close #55 and delete the schema-xref-support branch which has been languishing.

It means not having to code around the problem (which isn't really a problem once we remove the abstraction!)

adamjohnwright commented 7 years ago

I agree with these changes.

mgrbyte commented 7 years ago

here's an example of what I mean by a "raw" datomic schema.

adamjohnwright commented 7 years ago

I think that is much more readable and extendable than the current schema format

azurebrd commented 7 years ago

I imagine it would make more sense if I saw the raw version of the WS datomic schema, but it sounds good. To make sure I understand, are we talking about changing the schema257.edn format into a fuller format with more information ? I imagine that depending on how much Caltech (and other) curators want to be involved in the modeling of their datatypes, they'd want access to something relatively understandable, but as it is, I don't think they find the current .edn understandable, and we'd keep this format anyway in resourcecs/schema [sic ?] There hasn't been much talk on how modeling will work, but this sounds like it may be downstream of that.

sibyl229 commented 7 years ago

@mgrbyte +1 for having the :pace meta-schema in the generated schema. It's very helpful

mgrbyte commented 7 years ago

@sibyl229 @a8wright @azurebrd Here's an example of what the schema will end up looking like.

pseudoace-issue-65-example.zip

Notes

:pace/xref is included as meta-data
- because :db/valueType :db/ref attributes and their corresponding xrefs cannot be transacted at once (needs two passes, hence the fixups)
There may well be other :pace related attributes in this example that would need to be changed to meta data and processed in a 2nd phase transaction (anything that corresponds with the defs named *-fixups in schemata.clj.
Does not define partitions currently. This can be no other way** with the raw schema, since this is the exact representation of the schema that is transacted to datomic.

Questions

Does this schema as attached provide enough information to work out corresponding types of references?
Is there anything about this example schema that's not understood? (should be quote close to what we end up with)

Formatting of the EDN file could be improved.

** potentially change markup in the EDN using some other reader tag than meta if desired.

sibyl229 commented 7 years ago

@mgrbyte this doesn't change how schema is stored in Datomic once its loaded right?

I like the new format! Flat and easy to parse. Also, I think as long as :pace/use-ns and :pace/obj-ref show up with the attribute, it's great 👍

mgrbyte commented 7 years ago

@sibyl229 Yes, we have to preserve how the schema is stored in Datomic 1-to-1 with how it is now, proposal won't change that; and accordingly :pace/use-ns, :pace/obj-ref (and all other :pace* schema items will remain as it is today in the db.

This proposal only affects:

how the schema is worked with from a developers view
ACeDB reliance (eventually this will enable us to modify the EDN directly without need to parse ACe models)
code complexity

azurebrd commented 7 years ago

@mgrbyte @sibyl229 Cool cool.

Yeah, I also like being able to see the :pace/use-ns and :pace/obj-ref. It's great being able to see all the possible attributes and how they relate, very useful.

I liked that the current/previous .edn version had indenting and grouped the attributes for a given datatype together, while it's not as visually easy to scan until the next newline that begins with : and some related schemata are in separate sections. Possibly I should be using http://datomic-rest-dev.wormbase.org:8888 instead of the schema###.edn file, I just got used to it, and it's easy to search through and jump around. Certainly possible to use both .edn schemata files =)

sibyl229 commented 7 years ago

@mgrbyte that's great! @azurebrd Maybe you could query the schema in Datomic directly? For example: https://github.com/Datomic/day-of-datomic/blob/master/tutorial/schema_queries.clj

azurebrd commented 7 years ago

Thanks @sibyl229 Direct queries would be nice. The :find queries work well at http://datomic-rest-dev.wormbase.org:8888/browse but I don't have access anymore to make queries through lein repl to make the other kind of queries. It may be nice to sometime, but it's not been necessary so far. Do you make all your queries through that tool and the .clj files through datomic-to-catalyst, or do you ever query through some other way ?

WormBase / pseudoace

Store generated schema as raw datomic schema in EDN #65

Notes

Questions