MOZI-AI / annotation-scheme

Human Gene annotation service backend
GNU General Public License v3.0
3 stars 4 forks source link

atomese->json and json->atomese #179

Open linas opened 4 years ago

linas commented 4 years ago

@tanksha and @Habush I have a generic idea I want to explore. How many of the biome datasets are available in json format?

I'm thinking that it might be worth creating a generic json->atomese importer, and a generic exporter. However, this would work only if

  1. the input json dataset formats are not badly/insanely designed (i.e. the resulting imported atomese would have to look reasonable enough to be useable)

  2. There are not too many conversions needed to obtain something that can be easily reasoned on or pattern-mined. So, for example, the import process might be (a) generic json->atomese (b) run a BindLink to convert this generic-atomese into something friendlier for mining/pln/moses (c) actually run mining/pln/moses. (so maybe @ngeiswei you'd have an idea about this?)

What is not clear is whether step (b) above is easier than just writing a custom json importer. If it's not easier, then this generic-import idea seems like a not-very-good idea. But I can't really tell...

(follow up to issue #164)

Habush commented 4 years ago

How many of the biome datasets are available in json format?

We don't store any of the dataset in json format. The atomese to json conversion happens on-the-fly after running the annotation.

We are currently converting the atomese into a specific JSON format which we use for a specific purpose (visualization). But if we are to design a generic importer/exporter we should ignore the current format we are using and start from a scratch to specify a schema/format that would convey the information in atomese as much as possible. After that we can replace the current parser code with the generic one and write an adapter for the visualizer.

linas commented 4 years ago

We are currently converting the atomese into a specific JSON format

You should continue to do that, and I am NOT recommending that this be replaced or changed in any way.

I am asking a different question. There are datasets -- gene ontologies, proteome datasets, the GGI and PPI datasets, etc. for with @tanksha has written importers. Again, I am NOT recommending that those importers be thrown away or re-written. They work, so that's good enough.

What I'm asking about is OTHER datasets: e.g. SBML (systems biology markup language) - is it available as json? if we imported the json, is the "natural", "generic" import good enough, or would it require a custom importer?

linas commented 4 years ago

@rekado please let me know if you have any opinion about this.

mjsduncan commented 4 years ago

@linus, smbl isn't available in json but it is convertible to OWL (via bioPAX), and all the bio-ontologies we want to import are also available in OWL. a wizard/toolkit/pipeline for translating OWL format to atomese would be hugely useful. imo translating the bulk databases could be done via json because they all have api's, but in general the semantic translation is pretty simple and it seems more efficient to do it via bulk downloads.