Closed apriltuesday closed 1 year ago
@M-casado @tcezard As the submission window is earlier than we expected, this will have to be a quick one... If you don't have time in the next week or so please feel free to ignore this, we will at least have the updated automatic mappings.
If you do have time, the spreadsheet is here.
Because we did not get a chance to do the massive import from July, I tried to mark terms appearing in that list as SKIP so that you can filter them out and focus on other things. Hopefully that makes sense.
@apriltuesday I am curating the terms and found some things that I wanted to check with you:
2023-07-24 Add EFO disease
tab, since we didn't do it last time, right?ZOOMA_SOURCE
(so far I've seen previously-used
, replacement
, etc.)Notes
from the previous round not only on the UNSURE
, but also on the Blanks
. To avoid losing them, should we copy-paste them manually to this round's Comments
? The rubric mentions to copy them when UNSURE
.MONDO:0018875||NOT_SPECIFIED|replacement|NOT_CONTAINED
that:
URL
as the first element, have the CURIENOT_CONTAINED
, when I checked the term in EFO, it was indeed there. In this example, it says IMPORTED
, from MONDO to EFO.
Suggested replacement mapping
(e.g. http://purl.obolibrary.org/obo/MONDO_0019601|obsolete autosomal recessive axonal hereditary motor and sensory neuropathy|NOT_SPECIFIED|replacement|EFO_OBSOLETE
)These last ones make me wonder if the replacement finding part of the process is working as we expect it to.
Ready for review
128 DONE
778 SKIP
47 IMPORT
3 UNSURE
5468 Blank
Bear in mind it's my first time doing these steps of the curation, so something may not look like it should (especially comparing some DONE
numbers with regards to previous rounds).
I followed step by step the rubric, though, and filled a few blanks, but the ones that had a higher ClinVar Freq
were already ringing a bell, and I assume they've been there round after round.
Thanks Marcos, I need to check on a few of your comments but here are some quick answers:
The comment's column seems to have some automatic populated formula. I'll have to remove it for the terms I want to leave a comment at.
Yes sorry, I added that as a temporary measure to skip the July terms and add an explanatory comment, please remove it for any Comment and Status cells as needed. It won't be there in the future (hopefully).
We are to add new import terms to the 2023-07-24 Add EFO disease tab, since we didn't do it last time, right?
You shouldn't do anything for either Add EFO disease
tab, those are filled by the script after the manual curation.
I followed step by step the rubric
This is the right thing to do, as much as possible we should find shortfalls in the rubric and fix them in the documentation directly. It's more painful in the short-term but will serve us better in the long run IMHO.
I have found suggested replacement mappings like MONDO:0018875||NOT_SPECIFIED|replacement|NOT_CONTAINED that:
- Instead of having the URL as the first element, have the CURIE
- Even though the term says NOT_CONTAINED, when I checked the term in EFO, it was indeed there.
These are related, basically the format that we get in the replacement term field from the EFO API is inconsistent, so the code is also not able to check EFO containment properly. We should be able to fix the code to handle most of these cases, but there might always be exceptions as I think that field is manually filled by someone.
For example I found the term MONDO:0017138 as a replacement (imported from MONDO) of the obsolete MONDO:0007779, but the former was not listed as a replacement in the spreadsheet.
The replacement process specifically looks for this "term replaced by" annotation (example from MONDO_0007903):
I don't see that annotation in your example which would be why it isn't picked up. We could ask SPOT whether there's another annotation we could be using.
I see many Notes from the previous round not only on the UNSURE, but also on the Blanks. To avoid losing them, should we copy-paste them manually to this round's Comments?
I guess so, it's really not ideal though... it would really be better if we could focus on curating terms rather than comments, but I think your proposal makes sense for now.
I think I've captured your feedback in #402 (for issues we can address in the short-term, at least), let me know if I've missed something.
Hi @apriltuesday, thanks for the responses 👍
The replacement process specifically looks for this "term replaced by" annotation (example from MONDO_0007903):
I think it may be just this case where the problem was not on our plate, I reported it yesterday here, I think it's just that their chosen replacement is not the best fit for purpose, most likely.
Just to emphasize: both @tcezard and @apriltuesday, double check the numbers of DONE
, because in previous iterations I saw thousands, but now not even adding the SKIP
we seemed to get to that amount.
I've done a quick review of the DONE. I think the IMPORT we posted in July have not made it through to EFO yet. I check a few of them and did not see them. I added a few IMPORT and was surprised to see many with exact matches from MONDO but they were not reported by ZOOMA. It. might something we need to check before the next manual curation.
I think the IMPORT we posted in July have not made it through to EFO yet.
We didn't actually submit the July import since we wanted to check with SPOT first, so this is expected... I'll add the terms for this round.
I added a few IMPORT and was surprised to see many with exact matches from MONDO but they were not reported by ZOOMA.
Added to #402.
EFO issue: EBISPOT/efo#2109
This includes imports from this round and July, so I'll close both.
FYI I did the splitting of the sheets into import/new completely manually for now, I think we should look at improving the script though (probably in #391)
Refer to documentation for full description of steps.
Checklist: