I wrote a cleanup script that among other things merges all GO terms to their respective main ids (some GO terms have multiple synonymous ids listed as alt_ids in the OBO file) which uncovers some additional duplicates (genes annotated with GO terms X and Y even though they mean the same thing). This seems to affect only a very minor part of the annotations (in the case of wheat 177 annotations, that's 0.014 %).
Nevertheless, as a systematic question: Should we include this cleanup step into the pipeline itself?
I wrote a cleanup script that among other things merges all GO terms to their respective main ids (some GO terms have multiple synonymous ids listed as alt_ids in the OBO file) which uncovers some additional duplicates (genes annotated with GO terms X and Y even though they mean the same thing). This seems to affect only a very minor part of the annotations (in the case of wheat 177 annotations, that's 0.014 %). Nevertheless, as a systematic question: Should we include this cleanup step into the pipeline itself?