GMOD / Apollo

Genome annotation editor with a Java Server backend and a Javascript client that runs in a web browser as a JBrowse plugin.
http://genomearchitect.readthedocs.io/
Other
128 stars 85 forks source link

Expose & allow editing of organism / sequence metadata #981

Open hexylena opened 8 years ago

hexylena commented 8 years ago

Just throwing ideas out here, but I wanted to at least write them down and maybe other people would have useful comments.

Now that we're putting increasingly large numbers of genomes in Apollo, data retrieval is coming up as a (relatively minor) pain point. I was considering how I might ease this process, and the operations which would occur often for us. Those operations are largely "fetch all genomes matching criteria X."

This has a natural representation in the organismprop / organism_dbxref / feature_cvterms* tables in Chado. They're metadata about organisms / genomes / chromosomes which would be useful if they could be queried against.

* = for landmark features only

It would be nice if...

If this data was available, I could imagine the API supporting filtering, so I could make requests like:

$ curl https://fqdn/apollo/organism/findAllOrganisms?cvterms=Gram-positive-bacterium-type%20cell%20wall
...
$ curl https://fqdn/apollo/organism/findAllOrganisms?cvtermIds=120123,123,456346
...

Anyway. Just an idea.

nathandunn commented 8 years ago

Just to clarify . . you want (from Chado land) datastructres and webservices that support:

I had added metadata for Organism previously to support (hopefuly) JSON and it makes sense to do the same for Sequence.

My big question mark would be whether we want to add the more formal Chado versions of these or just leave them in JSON-land. The former would definitely make Chado export / import easier.

hexylena commented 8 years ago

@nathandunn yep, that sounds like what I want, support for that metadata so that my users can annotate it and query on it.

I'm not sure what you mean w/r/t more formal or not. The key point for us being that they're foreign keyed on the cvterm table so I can restrict the tags that the annotators use.

nathandunn commented 8 years ago

I think that sounds good. It might be awhile until we have time to work on it, though I don’t think it will be that bad. It should follow the same pattern as FeatureCVTerm.

I think that the trickier part is figuring out now you want users to annotate the Organism / Sequences within the database. We can link up to Chado, but we aren’t explicitly pulling anything in (yet).

On Apr 11, 2016, at 2:24 PM, Eric Rasche notifications@github.com wrote:

@nathandunn https://github.com/nathandunn yep, that sounds like what I want, support for that metadata so that my users can annotate it and query on it.

I'm not sure what you mean w/r/t more formal or not. The key point for us being that they're foreign keyed on the cvterm table so I can restrict the tags that the annotators use.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/981#issuecomment-208568942

cmdcolin commented 8 years ago

I started a database migration script to start moving towards using OrganismProperties more

https://github.com/cmdcolin/Apollo/commit/f3c1d842bfde8b3e631efb321eac69ff2431a5a7

It moves columns like blatdb and directory to properties since I think that these properties are not intrinsic to an organism and should be a property :)

It also tries to assert uniqueness on name to fix #990

nathandunn commented 7 years ago

@erasche This would be easy to do. Let me know if you need any help doing it and I can point you in the right direction.