geneontology / paint

This curation tool allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions.
Other
4 stars 4 forks source link

PAINT file contents. #11

Closed monicacecilia closed 8 years ago

monicacecilia commented 8 years ago

@selewis @huaiyumi Valerie Wood @valwood reported this issue back in 2015-04-09 to the GO-Help Desk and it seems to have been buried in JIRA. It was brought up again two weeks ago and it appears the issue still needs to be addressed. http://jira.geneontology.org/browse/GO-799 I will post comments from each person separately.

Initial issue from @valwood :

I'm not sure this is a directly a PAINT issue, but I noticed some anomolies in the PAINT file and I don't know where they would originate.

  1. There are some PomBase entries with a "pi" gene name in column2 (Db object ID). For example 5 pi006 5 pi013 4 pi021 6 pi026 2 pi036 3 pi038 2 pi040 3 pi042 3 pi047 9 pi051 3 pi058 3 pi072 5 pi077 8 pi078 These are not official PomBase DB:object IDs. All of the PomBase protein entries have a database object ID beginning SP*
  2. 227 entries have GeneDB_Spombe (there shouldn't be any entries with GeneDB_Spombe in the GO database, perhaps we just need to get rid of these. In addition some of these entries have uniprot IDs in column3 but I guess this problem will disappear when this is resolved ?)

    Response from @selewis:

So PANTHER previously was delivering two separate lines: 'SCHPO|PomBase=pi072|UniProtKB=O13663' '' 'GPI ethanolamine phosphate transferase 3' 'Schizosaccharomyces pombe' 'PomBase:SPBC27B12.06,PomBase:pi072' 'gpi13,gpi13' '' '' '' '' '' '' '' '' 'PTN000552957' 'SCHPO|PomBase=SPBC27B12.06|UniProtKB=O13663' '' 'GPI ethanolamine phosphate transferase 3' 'Schizosaccharomyces pombe' 'PomBase:SPBC27B12.06,PomBase:pi072' 'gpi13,gpi13' '' '' '' '' '' '' '' '' 'PTN001497162' Basically same UniProt protein sequence, but two separate genes. (And PANTHER in turn gets its IDs from the GCRP (gene centric reference proteome, see Chris's Google doc on the various proteome sets). In v10 of PANTHER this problem is gone, so once I can run touchup and there's a reload the problem should be gone.

Tanya asked:

Is this still an issue? From the PANTHER site, I see: Version 10.0 (release date May 15, 2015). If touchup has been run and a reload has occurred, the problem should be gone, according to Suzi's last comment.

@valwood's latest response: issue still needs to be addressed

Still an issue: From here: http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/gene-associations/submission/paint/pre-submission/ pombase paint file still has annotations with GeneDB prefix, even though these IDs have not been used in the PomBase GAF for >5 years? 142 GeneDB_Spombe 7051 PomBase column2 should only contain fission yeast systematic IDs but still has some old identifiers which are obsolete for about 10 yesasr (they did not correspond to single genes) pi047 pi058 pi072 pi077 pi078 The following are synonyms of the following genes pi013 is SPBC32H8.11 pi023 is SPBP22H7.08 pi026 is SPBP22H7.05c pi035 is SPBC691.01 pi036 is SPBC17A3.10 pi038 is SPBC17A3.08 pi040 is SPBC17A3.06 pi042 is SPBC17A3.04c pi044 is SPBC17A3.01c I don’t really understand where these are coming from because the pombe protein sequences are fully represented in Swiss-prot 1:1

ValWood commented 8 years ago

Thanks Moni!

monicacecilia commented 8 years ago

@ValWood please see comment form @huaiyumi on 2015-12-09 (he responded in JIRA): These IDs are in reference proteome datasets that are used in PANTHER. I think I already sent the feedback to Uniprot about this problem.

@huaiyumi, may I kindly ask that you please share any feedback you get from UniProt? - if any.

huaiyumi commented 8 years ago

I added my comment there. Basically the data was provided by reference proteome. We sent our feedback to them already about this problem.

Thanks,

Huaiyu

From: Monica Munoz-Torres notifications@github.com<mailto:notifications@github.com> Reply-To: geneontology/paint reply@reply.github.com<mailto:reply@reply.github.com> Date: Tuesday, December 8, 2015 at 10:29 PM To: geneontology/paint paint@noreply.github.com<mailto:paint@noreply.github.com> Cc: Huaiyu Mi huaiyumi@usc.edu<mailto:huaiyumi@usc.edu> Subject: [paint] PAINT file contents. (#11)

@selewishttps://github.com/selewis @huaiyumihttps://github.com/huaiyumi Valerie Wood @valwoodhttps://github.com/valwood reported this issue back in 2015-04-09 to the GO-Help Desk and it seems to have been buried in JIRA. It was brought up again two weeks ago and it appears the issue still needs to be addressed. http://jira.geneontology.org/browse/GO-799 I will post comments from each person separately.

Initial issue from @valwoodhttps://github.com/valwood :

I'm not sure this is a directly a PAINT issue, but I noticed some anomolies in the PAINT file and I don't know where they would originate.

  1. There are some PomBase entries with a "pi" gene name in column2 (Db object ID). For example 5 pi006 5 pi013 4 pi021 6 pi026 2 pi036 3 pi038 2 pi040 3 pi042 3 pi047 9 pi051 3 pi058 3 pi072 5 pi077 8 pi078 These are not official PomBase DB:object IDs. All of the PomBase protein entries have a database object ID beginning SP*
  2. 227 entries have GeneDB_Spombe (there shouldn't be any entries with GeneDB_Spombe in the GO database, perhaps we just need to get rid of these. In addition some of these entries have uniprot IDs in column3 but I guess this problem will disappear when this is resolved ?)

Response from @selewishttps://github.com/selewis:

So PANTHER previously was delivering two separate lines: 'SCHPO|PomBase=pi072|UniProtKB=O13663' '' 'GPI ethanolamine phosphate transferase 3' 'Schizosaccharomyces pombe' 'PomBase:SPBC27B12.06,PomBase:pi072' 'gpi13,gpi13' '' '' '' '' '' '' '' '' 'PTN000552957' 'SCHPO|PomBase=SPBC27B12.06|UniProtKB=O13663' '' 'GPI ethanolamine phosphate transferase 3' 'Schizosaccharomyces pombe' 'PomBase:SPBC27B12.06,PomBase:pi072' 'gpi13,gpi13' '' '' '' '' '' '' '' '' 'PTN001497162' Basically same UniProt protein sequence, but two separate genes. (And PANTHER in turn gets its IDs from the GCRP (gene centric reference proteome, see Chris's Google doc on the various proteome sets). In v10 of PANTHER this problem is gone, so once I can run touchup and there's a reload the problem should be gone.

Tanya asked:

Is this still an issue? From the PANTHER site, I see: Version 10.0 (release date May 15, 2015). If touchup has been run and a reload has occurred, the problem should be gone, according to Suzi's last comment.

@valwoodhttps://github.com/valwood's latest response: issue still needs to be addressed

Still an issue: From here: http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/gene-associations/submission/paint/pre-submission/ pombase paint file still has annotations with GeneDB prefix, even though these IDs have not been used in the PomBase GAF for >5 years? 142 GeneDB_Spombe 7051 PomBase column2 should only contain fission yeast systematic IDs but still has some old identifiers which are obsolete for about 10 yesasr (they did not correspond to single genes) pi047 pi058 pi072 pi077 pi078 The following are synonyms of the following genes pi013 is SPBC32H8.11 pi023 is SPBP22H7.08 pi026 is SPBP22H7.05c pi035 is SPBC691.01 pi036 is SPBC17A3.10 pi038 is SPBC17A3.08 pi040 is SPBC17A3.06 pi042 is SPBC17A3.04c pi044 is SPBC17A3.01c I don't really understand where these are coming from because the pombe protein sequences are fully represented in Swiss-prot 1:1

Reply to this email directly or view it on GitHubhttps://github.com/geneontology/paint/issues/11.

monicacecilia commented 8 years ago

Ok. In that case, please close this issue whenever you think it is completed. THanks! :) cheers,

huaiyumi commented 8 years ago

I believe the issue was in the previous PANTHER (9.0). Reference proteome fixed it already, so it should not be in v10 anymore.

Thanks,

Huaiyu

From: Monica Munoz-Torres notifications@github.com<mailto:notifications@github.com> Reply-To: geneontology/paint reply@reply.github.com<mailto:reply@reply.github.com> Date: Wednesday, December 9, 2015 at 1:09 AM To: geneontology/paint paint@noreply.github.com<mailto:paint@noreply.github.com> Cc: Huaiyu Mi huaiyumi@usc.edu<mailto:huaiyumi@usc.edu> Subject: Re: [paint] PAINT file contents. (#11)

Ok. In that case, please close this issue whenever you think it is completed. THanks! :) cheers,

Reply to this email directly or view it on GitHubhttps://github.com/geneontology/paint/issues/11#issuecomment-163156605.

selewis commented 8 years ago

Fixed.

monicacecilia commented 8 years ago

:dancer: :dancer: :dancer: