Closed nmenda closed 9 years ago
actually there are more missing terms. We have 257 in the latest file from cassavabase, but only 239 variable terms in the working copy on this repo. Working on fixing this . These terms must have gotten lost in one of the reformatting cycles.
Good that you bring this issue up. I think we should first be clear on the files involved in the issue. Let’s call:
Is it correct that you identified a lack of 18 variables between file A and file C? (257 variables in file A and 239 variables in file C)?
If you mean that file A is the OBO on https://github.com/nextgencassava/cassava_ontology, I cannot understand because:
As I could not look at the file A you meant, I was not able to derive the full list of missing terms. Nevertheless, I have worked on the other examples you gave (15, 77, 123, 224) and I have identified 2 causes of losing.
On 04/03/2015, Afola sent an updated version of the cassava ontology to Elizabeth. He sent 2 versions of this ontology:
Leave aside _CO_334:0000027 bacterial disease, CO_334:0000028 viral disease, CO_334:0000029 fungal disease, CO334:0000030 insect damage, the OBO has 2 variables that are absent of the TD: _CO334:0000077 post-harvest physiological deterioration and _CO334:0000123 plant height with leaf.
At that time, I had assumed that these two files were equivalent so I worked on file A1, the excel TD and not on file A2, the OBO. This might accounts for the losing of 2 variables (_CO334:000077: post-harvest physiological deterioration and _CO334:0000123: plant height with leaf)
I have looked for Ids that have been lost while curating, converting, exchanging files by comparing file A1 and file B (I saw no conversion issue between file B and file C). I have looked for ids that were present in file A1 and that disappeared in file B and found only 2 variables: _CO334:0000015 Harvest Index and _CO334:0000224 staygreen.
I have not checked so I cannot say when and why they got lost. But I apologize in advance if the losing of these 2 variables is my responsibility.
To the best of my knowledge and understanding, I can only make sense of this issue by saying that only _CO334:0000015 Harvest Index and _CO334:0000224 staygreen have been lost during the curation/formatting/conversions and that only _CO334:000077: post-harvest physiological deterioration and _CO334:0000123: plant height with leaf have been left out of the curation process.
Thanks for sharing more information that can help identify other missing variables.
Leo,
I checked the versions again, and it looks like you are correct, and the only missing terms are 0000015, 0000077, 0000123, 0000224 ! We might have other terms that did not make it into the CO version on 4/3/15. I will these 4 now and will try to add proper methods and scales. If there are more variables that got lost in the cracks between April and now we will add them again to this OBO file.
77, 123, 224, and 256 are missing from the CO file e.g. http://www.cassavabase.org/chado/cvterm?action=view&cvterm_id=76806