Planteome / amigo

This repo is the Planteome fork of geneontology.org AmiGO2 project. Issues in this repo should be reported only on AmiGO issues. Issues can be pushed upstream if relevant for GO.
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

allow/convert spaces inside relationships in column 16? #28

Open austinmeier opened 7 years ago

austinmeier commented 7 years ago

@cmungall is there a way to allow spaces in column 16?

We have lots of germplasm annotations that receive the relationship: "has_phenotype_score()" And the string that get's pulled from the Samara scrape file often has spaces in it. For example:

"Culmdiameter(mm)_of_basal_internode_at_repro.=6"

Currently I am simply substituting spaces, and other illegal characters for "_". It works, but it looks really bad when displayed on the browser. I was wondering if there was a way to allow spaces if the string containing them falls inside the "( )" of the relationship. Or if there is something I can replace the spaces with that would be converted to spaces when viewing them in the browser (Think "%20" in URLs). In the above example the column16 from the GAF would look like:

has_phenotype_score(Culm%20diameter%20(mm)%20of%20basal%20internode%20at%20repro.=6)

That way the browser would display this as:

has phenotype score Culm diameter (mm) of basal internode at repro.=6

Let me know if this is not clear.

cmungall commented 7 years ago

is there a way to allow spaces in column 16?

No

You're already abusing poor col16 enough!

it looks really bad when displayed on the browser.

This is all solved if we make sure there is an ontology providing labels (and ideally defs) for all relations used.

The underlying storage in amigo actually uses the RO ID. The c16 format is already slightly hacky in that it allows the use of what are called 'shorthand' IDs. This is exactly analogous to what you see in obo files:

id: PO:0000002
name: anther wall
relationship: part_of PO:0009066 ! anther

...

[Typedef]
id: part_of
name: part_of
xref: BFO:0000050 ! magic xref
is_transitive: true

you'll notice in the OWL the part_of is gone, it's just the URI and the label (name).

Now, I don't think we want Culm_diameter_(mm)_of_basal_internode_at_repro in RO. You can make a private ontology.

But we may want to explore another pattern

e.g.

has_measurement(FOO:nn),has_unit(UO:nn),has_value(6)

where FOO:nn is scale-independent

or

has_measurement(FOO:nn),has_value(6)

where FOO:nn has the scale

(and FOO:nn may be a CO class)

austinmeier commented 7 years ago

Oh I've been abusing column 16 since I got started on this GAF business!!

We intentionally added the "has_phenotype_value()" relationship in the TO so that we could have it displayed, but due to the insane variation in "values" that we are pulling between Samara scraping GRIN, and IRRI's GRIMS database, that using a pre-composed pattern such as the one you've suggested becomes rather labor intensive.
I will look into doing something similar for the Samara scrape, as the data for that seems to be relatively "uniform"

If using the pattern suggested, can FOO:nn be the GRINDescr:nn for each grin descriptor? Because that might actually work out quite nicely for the GRIN data.

I'll do some poking around, and see what I find. The main issue I see is that the scales used in GRIN are not CO scales (or at least not exactly.) Perhaps Jorrit may be able to scrape the Descriptors and their respective scales, and we could whip up an internal ontology to support the phenotypes in GRIN... (just thinking out loud.)