geneontology / neo

noctua entity ontology
9 stars 2 forks source link

Please use http://purl.uniprot.org/uniprot or http://purl.uniprot.org/isoform/ IRIs for UniProt concepts #34

Open JervenBolleman opened 5 years ago

JervenBolleman commented 5 years ago

This will make it easier to link the UniProt data with the GO (A) data on RDF and OWL level.

Mostly, it will make it easier for us to introduce Noctea compatible modelling for UniProt->GO term Relations. With the benefit of users loading both data not getting duplicate triples just because we don't use the same IRIs.

cmungall commented 5 years ago

What does the uniprot PURL denote? If other graphs assert it's an IAO ICE we end up with incoherency. We need to treat it as a material entity (not that identifiers.org is clear on this)

@nataled are you using uniprot PURLs as ICEs in PRO?

nataled commented 5 years ago

At the moment we don't use them for anything other than what they are: database entries. In PRO they are only used for cross-references and evidences. However, going beyond that, they would be considered ICEs. Think of the distinction between SO and MSO: UniProtKB would be akin to SO, while PRO would be akin to MSO.

cmungall commented 5 years ago

Thanks! I'm most interested in what they are asserted or entailed to be in OWL.

If you have axioms that cause a uniprot PURL (as in an actual purl.uniprot.org PURL) to be entailed as an ICE (for example, through use of an object property with domain/range constraints) then the combined knowledge graph with GO-CAM will have inconsistencies.

I believe this is the case. I believe also that @JervenBolleman who is the authority on what the purls mean would say that these denote database records, not proteins. Both of these facts indicate that we should not use these purls for the neo classes (funnily enough, the PRO class has the intended semantics, but GO annotators want identifiers with UniProtKB prefixes, and we need all of at least swiss-prot materialized, which means we can't use PRO).

Note that SO classes are not subclasses of ICEs, many SO classes have instances that exist independently of database records.

nataled commented 5 years ago

Any Swiss-Prot entry can be trivially materialized in PRO. Some will need special treatment, of course, but we know how to deal with those. The only thing that stops us from doing it is that we've not had a request to do so. But, go ahead and try it. Take any Swiss-Prot accession that doesn't have a corresponding PRO, and prefix it with a PRO PURL (purl.obolibrary.org/obo/PR_). Works for TrEMBL too.

JervenBolleman commented 5 years ago

OWL-DL speaking -> identifiers.org states

<http://identifiers.org/uniprot/P05067> owl:sameAs <http://purl.uniprot.org/uniprot/P05067>

so this is not a modelling change.

However, for SPARQL ease of use doing federated queries it helps a lot for the practical adoption of Noctea models if we can cross query them. IRI conversion in SPARQL queries is possible but a pain that we would rather not have.

DataRecord = owl:Class when I a talk. It means that a single UniProt record/class represents between 0 and practically infinite numbers of molecules, similar to a PRO class. Changing from rdf:type uniprot:Protein to rdfs:subClassOf uniprot:Protein is still on our todo list. However, with the current state of reasoners our users would have serious problems with the billions of axioms.

@cmungall @nataled using PRO or UniProt is a separate discussion from this bug report and I suggest you open an issue and discussion of that separately. IMHO The more flexible semantics of UniProt is actually key for Noctea success as PRO semantic limits make it invalid to express some desired annotations (especially regarding the function of secreted proteins).

cmungall commented 5 years ago

OWL-DL speaking -> identifiers.org states http://identifiers.org/uniprot/P05067 owl:sameAs http://purl.uniprot.org/uniprot/P05067

Where does this axiom come from?

curl  -H "Accept: text/turtle"  http://identifiers.org/uniprot/P05067 

doesn't return anything

If there is a sameAs axiom, formally it doesn't affect us, since we're in OWL-DL and protected by punning (sameAs only applies to individuals, and we're using classes).

DataRecord = owl:Class when I a talk.

OK, I will try and mentally translate but this extra layer confuses me. Would you apply this to GO too? To me, every GO class represents a process or cellular entity type. Yes, the class is also an information entity but this is implicit. It's most parsimonious to leave out talk of information entities when modeling unless one explicitly wants to talk about information entities.

IMHO The more flexible semantics of UniProt is actually key for Noctea success as PRO semantic limits make it invalid to express some desired annotations (especially regarding the function of secreted proteins

Actually flexible semantics is not good for us, and much as I want easy federated querying, if we don't have logically consistent models, reasoning doesn't work and we rely on reasoning for everything. We need precise semantics.

Can you explain what you mean about secreted proteins? I don't see any challenges representing this as a GO-CAM (in fact we have axiomatized classes like renin secretion in the ontology using PRO semantics).

To summarize, we need pro-like semantics (proteins like 'human shh' as classes), but uniprotkb prefixes, as the community wants to annotate to uniprot.

It sounds like you might be open to providing the semantics we need, but are blocked by this:

Changing from rdf:type uniprot:Protein to rdfs:subClassOf uniprot:Protein is still on our todo list. However, with the current state of reasoners our users would have serious problems with the billions of axioms

What reasoners are you using? This seems like a fairly tractable technical challenge. And there may be options like using a tbox shadowed in the abox for internal reasoning but publishing as a tbox.

alanruttenberg commented 5 years ago

It seems clear that the UniProt concept and the PRO class are different sorts. Why can't the interoperability be handled in NEO's interface? E.g. Accept either ID as input. Use PRO ids internally, display ids according to preference for one or the other. Generate RDF/OWL suitable for integration into UniProt's SPARQL endpoint that matches UniProt's policy for how a PRO ID maps to a UniProt record.

There will be issues to address since PRO isn't strictly one to one with UniProt even at the organism-gene level. but those issues won't be addressed by simply equating the two. Exposing those assumptions clearly, and having the tool users understand what they are accepting by choosing one or the other identifiers would be quite a good thing insofar as making clearer the relation of UniProt to PRO.

Using the UniProt ids for protein classes also has the consequence that we no longer have an identifier for the information content entity that is the UniProt record, which we could otherwise use in different ways. For example, the canonical sequence (as information artifact) is part of the UniProt record, but it isn't the sequence of all all the proteins in the class.

As an example of an issue consider the relation of the isoform to the organism-gene level. We use a subclass relation, but as far as I can tell, UniProt does not. I think it would be hard, and require substantial commitment, to coordinate the RDF/OWL in the sense of being able to simply add a piece of OBO RDF/XML to UniProt RDF/XML and expect the result to make sense. If we're not going to be able to do that it isn't clear what benefit there is to using the same identifier.

alanruttenberg commented 5 years ago

BTW, I'm happy to chat and discuss the issues, if you are interested.

cmungall commented 5 years ago

It seems clear that the UniProt concept and the PRO class are different sorts

I would like to explore this further, as it's not totally clear to me that they are. Sorry I missed the call.

Why can't the interoperability be handled in NEO's interface? E.g. Accept either ID as input. Use PRO ids internally, display ids according to preference for one or the other.

I would like to do this, but this would involve multiple exceptions into the code at different points, increasing overall fragility. On top of that, there are member groups of the GOC who have expressed that they want to annotate to UniProtKB IDs (including prefix) and I have to respect that.

There will be issues to address since PRO isn't strictly one to one with UniProt even at the organism-gene level. but those issues won't be addressed by simply equating the two. Exposing those assumptions clearly, and having the tool users understand what they are accepting by choosing one or the other identifiers would be quite a good thing insofar as making clearer the relation of UniProt to PRO.

Let's take the case of a GCRP swissprot entry and the corresponding entry. There are definitely issues to address here (e.g. sometimes GCRP will include trembl, but at least for human we should be 99% in agreement), but I think these are separable (and they are already being discussed elsewhere).

What would it mean to expose the different assumptions between

To a biologist and the users of Noctua these seem to indicate the same thing. And to me as well: I believe they are intended to denote the same thing, the PR purl is just clearer and more explicit about OWL commitments and relationships to other OBO entities.

Using the UniProt ids for protein classes also has the consequence that we no longer have an identifier for the information content entity that is the UniProt record, which we could otherwise use in different ways. For example, the canonical sequence (as information artifact) is part of the UniProt record, but it isn't the sequence of all all the proteins in the class.

I'm not convinced you need this level of meta-representation, but in any case I believe you want to use a PURL with the sequence version embedded for this use case, the sequence in the db may change over time.

E.g These differ by one residue:

https://www.uniprot.org/uniprot/Q9FXT6.fasta?version=1 https://www.uniprot.org/uniprot/Q9FXT6.fasta?version=74

So if you want to explicitly and logically represent an alignment relative to a sequence you'd need to use the version IRIs, or just encode the string directly.

As an example of an issue consider the relation of the isoform to the organism-gene level. We use a subclass relation, but as far as I can tell, UniProt does not.

Yes, this could cause big problems, if asserting a subclass introduces inconsistency.

I think it would be hard, and require substantial commitment, to coordinate the RDF/OWL in the sense of being able to simply add a piece of OBO RDF/XML to UniProt RDF/XML and expect the result to make sense. If we're not going to be able to do that it isn't clear what benefit there is to using the same identifier.

I think this is the crux of the issue. I agree that if the results of doing the combination are incoherency then it won't work (see the first comment from me in this ticket). At the moment these is a certain amount of shielding due to the punning, but that's not quite satisfactory (although that is a potential long term strategy here).

We need to know more about plans for OWL commitments on the uniprot PURLs from their maintainers. Comments above from Jerven like "Changing from rdf:type uniprot:Protein to rdfs:subClassOf uniprot:Protein is still on our todo list." suggest things are moving in the direction of compatibility, so I am hopeful.

alanruttenberg commented 5 years ago

I come to my conclusion about them being distinct sorts from two directions. First, as you say, PRO is very clear about what their entities denote. UniProt is not. Not because they can't or don't want to, but because they view their resource as a database, not an ontology. Without understanding exactly what their entities denote (and verifying that their logical assertions regarding them concord), we can't adequately compare them to PRO.

Second, where I have looked for implicit commitments as evidenced in assertions in their RDF, I find incompatibilities. We agree that combining our and their RDF will be incoherent.

in any case I believe you want to use a PURL with the sequence version embedded for this use case, the sequence in the db may change over time.

My presumption was that UniProt's RDF gave distinct sequences distinct PURLs. If so, then those would be adequate. If not, we would do whatever we have to in order to properly record sequence, but that would also expose another way in which the commitments of the two resources differ.

On the matter of respecting your users, I understand that need, but that seems to be something that you need to address with in the tool, not necessarily in the ontologies. I haven't really looked at Noctua/NEO other than what I've seen in a couple of presentations and so at the moment, I don't understand it's model and logical commitments. Because of that I can't speak to the use of UniProt IRIs there. What I do know is that, insofar as OBO ontologies go, these IRIs represent different things.

I would like to do this, but this would involve multiple exceptions into the code at different points, increasing overall fragility. On top of that, there are member groups of the GOC who have expressed that they want to annotate to UniProtKB IDs (including prefix) and I have to respect that.

SMOP. I have trouble sympathizing with the idea that in order to alleviate some bit of programming we should introduce substantial confusion about ontology. From my point of view, there is a perfectly coherent view of UniProt as database consisting of ICEs, and PRO as ontology, a view which is in concordance with what the developers of each resource.

Regarding the multiple exceptions, if you are interested we could look at the code together and brainstorm to find a way to handle the interconversion in a clean and minimally disruptive manner.

--

If, at some point, UniProt were to decide that they want the resource to be understood as an OBO ontology, something I would love them to do (I've said so in the past), then that would reopen the question for me. A good collaboration between UniProt and PRO might be to undertake that effort assuming all parties were interested and committed, and that the effort could be funded.

cmungall commented 5 years ago

On the matter of respecting your users, I understand that need, but that seems to be something that you need to address with in the tool, not necessarily in the ontologies

No, the requirement is that uniprotkb is used, regardless of tooling.

JervenBolleman commented 5 years ago

This issue was very specific regarding IRI's for uniprot resources. Where I have a large preference to use the resources IRIs directly if they have an RDF form. If for logical reasons a different concept is required my preference is to have a new IRI that relates to our IRI with as clear a semantics as possible. e.g. something like this http://example.org/noctea-(re)interperation-of-uniprot/P05067 skos:closeMatch http://purl.uniprot.org/uniprot/P05067

I also don't mind the axioms added by noctea to a UniProt IRI, I think all the ones I have seen are valid. But might not apply if it is about a synthetic peptide so maybe move up one more level to CHEBI:33695. I am not sure how that is/should be curated in noctea models and if those special cases need exact treatment.

My current belief is that it would always be valid to state that PRO:PAAAA rdfs:subClassOf uniprot:PAAAA but not always uniprot:PAAAA rdfs:subClassOf PRO:1. Mostly, I worry (too much) about the biological exceptions that are rather interesting and (lethally, considering many of them are about toxins) hard to represent accurately (I might have a fear of ontological over commitment).

@Alan Ruttenberg alanruttenberg@gmail.com we would love to work on formalizing aspects of UniProt curation, but funding for this is so hard to get :( I suspect would end up looking a bit different from PRO but inspired by it, and have so many, many classes and axioms.

@Chris Mungall cjmungall@lbl.gov I also agree with @Alan Ruttenberg alanruttenberg@gmail.com that I would love to attend a Noctea modelling and logic presentation. Really of topic of this bug report but do you have a pointer to good intro material?

On Wed, Nov 7, 2018 at 8:56 PM Chris Mungall notifications@github.com wrote:

On the matter of respecting your users, I understand that need, but that seems to be something that you need to address with in the tool, not necessarily in the ontologies

No, the requirement is that uniprotkb is used, regardless of tooling.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/34#issuecomment-436756643, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8MFWOwgDKTOfoyHiu2t535TdktygLlks5uszr2gaJpZM4Xy_e6 .

-- Jerven Bolleman me@jerven.eu

cmungall commented 5 years ago

My current belief is that it would always be valid to state that PRO:PAAAA rdfs:subClassOf uniprot:PAAAA but not always uniprot:PAAAA rdfs:subClassOf PRO:1. Mostly, I worry (too much) about the biological exceptions that are rather interesting and (lethally, considering many of them are about toxins) hard to represent accurately (I might have a fear of ontological over commitment).

Sorry, I am not following this part

cmungall commented 5 years ago

synthetic peptides: we don't annotate to these, only gene products of genes, so these would not be in neo

pointer to good intro material? @balhoff's presentation from RO mtg: https://buffalo.app.box.com/s/spp9iam2zjoe0hmxjur5fvlyssvp56vl

alanruttenberg commented 5 years ago

This issue was very specific regarding IRI's for uniprot resources. Where I have a large preference to use the resources IRIs directly if they have an RDF form. If for logical reasons a different concept is required my preference is to have a new IRI that relates to our IRI with as clear a semantics as possible. e.g. something like this http://example.org/noctea-(re)interperation-of-uniprot/P05067 skos:closeMatch http://purl.uniprot.org/uniprot/P05067

I think that using the PRO URIs in combination with skos:closeMatch is the best of both worlds. PRO terms have clear semantics and is already mapping, where appropriate, to UniProt. Using skos:closeMatch is a good bridge between OBO ontology terms and a more RDF-oriented view.

What do you think, @nataled

nataled commented 5 years ago

After further rumination and discussion, I come to the conclusion that the main problem (for PRO) is that the scientific community uses UniProtKB identifiers to mean two different things. One, exemplified by GOA, is that they are basically the same as PRO, that is, that they represent actual proteins that can be annotated with functions, etc. The other, exemplified by Pfam and other protein classification projects, is that they represent the sequences of those proteins. My concern about usage of UniProt vs PRO centers on the need (by PRO) for that latter interpretation, and that imposing the former interpretation on the uniprot purls would leave us without a way to talk about the sequences themselves. So, a question to @JervenBolleman: assuming that http://purl.uniprot.org/uniprot/P05067 refers to a class of proteins, how would you refer to, say, the canonical sequence of that class? If there is a way to separate the two interpretations that solves the immediate problem. Bear in mind the following:

1) Personally, I think the right way to refer to the specific sequence is via UniParc. However, we are constrained by the fact that there are almost zero resources that refer to UniParc identifiers, and since we wish to import information (for example, classification into protein families), we have to use the same identifiers as those resources; that is, UniProtKB.

2) Using the version IRIs for sequences suffers from a related but different concern; namely, that we don't know which version is used by these resources. We understand that sometimes sequences are refined without changing the UniProtKB accession, and we are prepared to deal with that.

cmungall commented 5 years ago

In the uniprot triplestore there is a up:sequence property that connects an entry to isoform entries. But I think what is required is a PURL for the sequence specifically, e.g. having a PURL for https://www.uniprot.org/uniprot/Q9FXT6.fasta?version=1

alanruttenberg commented 5 years ago

Note that the isoform entries are isoforms in name (url) only. The actual type is up:Sequence documented as "An amino acid sequence".

On Wed, Nov 21, 2018 at 12:32 PM Chris Mungall notifications@github.com wrote:

In the uniprot triplestore there is a up:sequence property that connects an entry to isoform entries. But I think what is required is a PURL for the sequence specifically, e.g. having a PURL for https://www.uniprot.org/uniprot/Q9FXT6.fasta?version=1

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/34#issuecomment-440749684, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOxDkDwgkL96tiY-mn--CRlTY9jAA3Qks5uxY4bgaJpZM4Xy_e6 .

nataled commented 5 years ago

uniprot purls Looking at an actual example rdf (https://www.uniprot.org/uniprot/P10403.rdf), it seems that the isoforms PURL is the sequence. I take as evidence of this two things (see attached screenshot):

1) "A" shows that the value of the isoform PURL is the sequence. 2) "B" shows that the isoform PURL is the resource used for positions

My interpretation of this is that http://purl.uniprot.org/uniprot/P10403 can (does?) refer to the protein entity (in the PRO sense) while http://purl.unitprot.org/isoforms/P10403-1 refers to (what happens to be) the canonical sequence of that protein. @JervenBolleman can you confirm?

I also note the following: 1) The word 'isoform' in the isoform PURL given in this ticket is singular, but in practice it is plural. 2) Neither version actually resolves to anything useful

An open question involves whether or not http://purl.uniprot.org/uniprot/P10403-1 is a valid PURL for the protein (material) entity that refers to that specific isoform.

goodb commented 5 years ago

Mainly repinging folks working on this thread. Wondering if we could try again for a consensus decision as its impacting GO work in multiple projects.

For what its worth, after reading through the above it seems that there is a consensus that PRO OWL semantics are a better match for the Noctua use case than what we get from UniProt (RDF) now.

I see two things stopping us from switching over. 1) PRO would need to add all of the proteins needed by GOC annotators. According to @nataled above (regarding trembl) it sounds like this would be possible. 2) Either GOC folks are convinced to use the PRO ids (sounds unlikely) or through a SMOP they see what they want to see in the Noctua UI (for selecting genes) and in the Noctua output (especially the flatfile GPAD output). The SMOP would be greatly enabled if PRO maintained a clear semantic structure mapping from PRO classes to UniProt records. (xref is not sufficiently clear in meaning).

?

cmungall commented 5 years ago

On Sun, Sep 29, 2019 at 10:58 PM goodb notifications@github.com wrote:

Mainly repinging folks working on this thread. Wondering if we could try again for a consensus decision as its impacting GO work in multiple projects.

For what its worth, after reading through the above it seems that there is a consensus that PRO OWL semantics are a better match for the Noctua use case than what we get from UniProt (RDF) now.

It would put it slightly differently: we know the PRO OWL semantics work, but we don't know enough about the uniprot semantics to know if we can treat them as equivalent or as something else (but I see you address this below)

Additionally, we can't entirely put aside sociotechnological constraints of one set of IDs/URIs vs another...

I see two things stopping us from switching over. 1) PRO would need to add all of the proteins needed by GOC annotators. According to @nataled https://github.com/nataled above (regarding trembl) it sounds like this would be possible.

What are the semantics of a non-GCRP trembl ID according to PRO?

But it's not just trembl. It would need to be the whole protein universe. The consequence of PRO going up to 260m+ gene-level entries in a single OWL file would need to be determined. At the least PRO needs to start distributing more ready-cut modules (which I've requested for a while)

2) Either GOC folks are convinced to use the PRO ids (sounds unlikely) or through a SMOP

SMOP?

they see what they want to see in the Noctua UI (for selecting genes) and in the Noctua output (especially the flatfile GPAD output). The SMOP would be greatly enabled if PRO maintained a clear semantic structure mapping from PRO classes to UniProt records. (xref is not sufficiently clear in meaning).

+1

Darren and Jerven is there anything we can do to facilitate this?

?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/34?email_source=notifications&email_token=AAAMMONOEN5J4IQ7ELYV6N3QMGIP7A5CNFSM4F6L665KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD74QHUI#issuecomment-536413137, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOPU6IQ4UKO7VL7ZQZTQMGIP7ANCNFSM4F6L665A .

goodb commented 5 years ago

Sorry, I saw @alanruttenberg 's use of SMOP (small matter of programming) above and liked its connotations.. If we have the mappings from PRO to uniprot up front, I don't think its terrible to handle the translations in the Noctua code. I have a cut at doing this for the reactome entities -> uniprot for GPAD working in noctua-dev now.

I don't see how you avoid loading the whole protein universe without a Noctua stack architecture change??? Whether its a PRO expansion or UniProt being ingested into neo, we still end up with a gigantic OWL file.

For other's information, as it stands now, Noctua is driven from a 1.45gb merged OWL file (go-lego) of which 1.12gb is neo. This contains all of the classes that can be used to type the instances in the go-cam models, with neo containing the gene product classes. Although it introduces some technical hassle (e.g. that the entire file is loaded by default when attempting to load a GO-CAM owl model into protege or other) it actually works just fine for the Noctua application right now. Its probably drifting off topic here, but if there was a way to grow neo based on curator demand (e.g. one protein at a time as they needed it), we might be able to solve the giant OWL file problem.

JervenBolleman commented 5 years ago

@goodb and @cmungall could you please open separate issues for separate concerns? This issue was quite focussed in it's request and now asks a zillion different things in your discussions.

Basically, my request is -> if you annotate UniProt entries use UniProt purls. If you are annotating something else, use something else.

Don't have users annotate UniProt but use PRO, nor have users annotate PRO and use UniProt. Not every UniProt case can be represented in PRO (or the other way around), nor are these the only two databases that users of noctua might wish to use. e.g. nextprot and ensembl protein's are valid IRI targets for GO-CAM annotation as well.

cmungall commented 5 years ago

ok shall we do this on the pro tracker since this question (semantic rel between up and pro purls) isn't really a go issue per se

On Mon, Sep 30, 2019 at 11:23 AM JervenBolleman notifications@github.com wrote:

@goodb https://github.com/goodb and @cmungall https://github.com/cmungall could you please open separate issues for separate concerns? This issue was quite focussed in it's request and now asks a zillion different things in your discussions.

Basically, my request is -> if you annotate UniProt entries use UniProt purls. If you are annotating something else, use something else.

Don't have users annotate UniProt but use PRO, nor have users annotate PRO and use UniProt. Not every UniProt case can be represented in PRO (or the other way around), nor are these the only two databases that users of noctua might wish to use. e.g. nextprot and ensembl protein's are valid IRI targets for GO-CAM annotation as well.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/34?email_source=notifications&email_token=AAAMMONBSKQ4XPQX4Y2L6TDQMI72TA5CNFSM4F6L665KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD76TN4I#issuecomment-536688369, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOMAILCPUFFAOU3FKQLQMI72TANCNFSM4F6L665A .

nataled commented 5 years ago

I'm fine with using the PRO tracker, even though pretty much all the unanswered questions are about UniProt. Here are the topics discussed (probably missed a few):

1) What do the UniProt PURLs denote: database entry, protein class, or sequence? 2) How does PRO relate to UniProt? 3) User needs: a SMOP, or address ontologically?

Topics 1 and 2 will be further addressed here: https://github.com/PROconsortium/PRoteinOntology/issues/165

Finally, one point of clarification:

Don't have users annotate UniProt but use PRO, nor have users annotate PRO and use UniProt. Not every UniProt case can be represented in PRO (or the other way around), nor are these the only two databases that users of noctua might wish to use. e.g. nextprot and ensembl protein's are valid IRI targets for GO-CAM annotation as well.

Actually, every UniProt case CAN be represented in PRO. It's just that a small subset has to be done manually.

cmungall commented 5 years ago

Thanks Darren! I'm following the issue in the PRO tracker. We will hold up all discussion on this issue on this tracker for now

On Wed, Oct 2, 2019 at 5:30 AM Darren A. Natale notifications@github.com wrote:

I'm fine with using the PRO tracker, even though pretty much all the unanswered questions are about UniProt. Here are the topics discussed (probably missed a few):

  1. What do the UniProt PURLs denote: database entry, protein class, or sequence?
  2. How does PRO relate to UniProt?
  3. User needs: a SMOP, or address ontologically?

Topics 1 and 2 will be further addressed here: PROconsortium/PRoteinOntology#165 https://github.com/PROconsortium/PRoteinOntology/issues/165

Finally, one point of clarification:

Don't have users annotate UniProt but use PRO, nor have users annotate PRO and use UniProt. Not every UniProt case can be represented in PRO (or the other way around), nor are these the only two databases that users of noctua might wish to use. e.g. nextprot and ensembl protein's are valid IRI targets for GO-CAM annotation as well.

Actually, every UniProt case CAN be represented in PRO. It's just that a small subset has to be done manually.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/34?email_source=notifications&email_token=AAAMMOPY5KZEMDE4ZMWHGYLQMSH6TA5CNFSM4F6L665KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAESKIA#issuecomment-537470240, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOIA6HGAK3CAP2B4XLLQMSH6TANCNFSM4F6L665A .

pgaudet commented 2 years ago

What's the status of this?

JervenBolleman commented 2 years ago

No progress. Still open. I prefer that Neo uses http://purl.uniprot.org/uniprot/A0A024BTL2 instead of http://identifiers.org/uniprot/A0A024BTL2 when talking about UniProt entries/classes. Especially now that identifiers.org does not recommend their own IRI pattern.