Planteome / CO_334-cassava-traits

Cassava Trait Ontology maintained by Crop Ontology and Cassavabase
https://cropontology.org/term/CO_334:ROOT
1 stars 1 forks source link

Add missing variables from Cassavabase #1

Closed nmenda closed 9 years ago

nmenda commented 9 years ago

77, 123, 224, and 256 are missing from the CO file e.g. http://www.cassavabase.org/chado/cvterm?action=view&cvterm_id=76806

nmenda commented 9 years ago

actually there are more missing terms. We have 257 in the latest file from cassavabase, but only 239 variable terms in the working copy on this repo. Working on fixing this . These terms must have gotten lost in one of the reformatting cycles.

leova commented 9 years ago

Good that you bring this issue up. I think we should first be clear on the files involved in the issue. Let’s call:

Is it correct that you identified a lack of 18 variables between file A and file C? (257 variables in file A and 239 variables in file C)?

What is file A?

If you mean that file A is the OBO on https://github.com/nextgencassava/cassava_ontology, I cannot understand because:

As I could not look at the file A you meant, I was not able to derive the full list of missing terms. Nevertheless, I have worked on the other examples you gave (15, 77, 123, 224) and I have identified 2 causes of losing.

1/ the terms were not present in the original working curation file

On 04/03/2015, Afola sent an updated version of the cassava ontology to Elizabeth. He sent 2 versions of this ontology:

Leave aside _CO_334:0000027 bacterial disease, CO_334:0000028 viral disease, CO_334:0000029 fungal disease, CO334:0000030 insect damage, the OBO has 2 variables that are absent of the TD: _CO334:0000077 post-harvest physiological deterioration and _CO334:0000123 plant height with leaf.

At that time, I had assumed that these two files were equivalent so I worked on file A1, the excel TD and not on file A2, the OBO. This might accounts for the losing of 2 variables (_CO334:000077: post-harvest physiological deterioration and _CO334:0000123: plant height with leaf)

2/ the terms were lost during the curation/formatting process

I have looked for Ids that have been lost while curating, converting, exchanging files by comparing file A1 and file B (I saw no conversion issue between file B and file C). I have looked for ids that were present in file A1 and that disappeared in file B and found only 2 variables: _CO334:0000015 Harvest Index and _CO334:0000224 staygreen.

I have not checked so I cannot say when and why they got lost. But I apologize in advance if the losing of these 2 variables is my responsibility.

My conclusion

To the best of my knowledge and understanding, I can only make sense of this issue by saying that only _CO334:0000015 Harvest Index and _CO334:0000224 staygreen have been lost during the curation/formatting/conversions and that only _CO334:000077: post-harvest physiological deterioration and _CO334:0000123: plant height with leaf have been left out of the curation process.

Thanks for sharing more information that can help identify other missing variables.

nmenda commented 9 years ago

Leo,

I checked the versions again, and it looks like you are correct, and the only missing terms are 0000015, 0000077, 0000123, 0000224 ! We might have other terms that did not make it into the CO version on 4/3/15. I will these 4 now and will try to add proper methods and scales. If there are more variables that got lost in the cracks between April and now we will add them again to this OBO file.

nmenda commented 9 years ago

https://github.com/Planteome/ibp-cassava-traits/commit/7fa2a636f7bc136058f6cce1cc5731c991198514 closes this issue