Closed Hannah-Davies closed 2 years ago
@shahzadmumtaz22 is there any update on this please ?
Hi Hannah-Davies,
I have replied vie email.
@ieuans are you able to look at the other issues? Email text (as unable to attach email itself): Please find below my feedback/investigation results (marked as red text) for each one of the data source related issues highlighted by Susheel. Mainly there were three different kinds of issues:
• ID: 16 - HES APC It seems that problem is in the source markdown file where data source name is not correctly mentioned (and it looks the data source name appearing is coming from that wrong name (which is not linkable to data sources) and resulting an additional data source). A quick fix for this seems to me is that I can make a change in the GitHub source and will make a PR for Spiros to review/merge in the GitHub repository. It is important to mention that data_sources.yml file has this dataset including id and URL.
• ID: 19 - Office for National Statistics - Death Registration Data? There is only one associated phenotype to this, and its hyperlink is working fine. The data_sources.yml file has id and URL. Something would have gone wrong to the URL and id while importing this.
• ID: 7 - CPRD GOLD There are four associated phenotypes to it, and this can be fixed at the source GitHub repository by giving the data source to "Clinical Practice Research Datalink GOLD" instead of "Primary care (Clinical Practice Research Datalink GOLD)"
Uncertain Datasets
• ID: 27 - Clinical Practice Research Datalink - There is no CPRD dataset. This should be either CPRD GOLD or AURUM. This is fixed in the source as GOLD was missing and the import script has created another data source.
• ID: 19 - Primary Care - Suggest to remove I think the id of this one should be 20 not 19. In the source it is written as primary not clear should we consider it as GOLD or AURUM. There is only one phenotype under this data source and for that there is no associated publication to this. It's better to get some input on this from Spiros.
• ID: 24 - QResearch? Which dataset? For the UoM phenotypes, there were some phenotypes associated with this primary care dataset. I can't find this dataset in the healthdategateway. The link to this dataset is external (i.e. https://www.qresearch.org/)
• ID: 25 & ID: 14 - THIN are the same data source . They both are the same dataset and I think they appear differently because both GitHub respositories (github repository of CALIBER and GitHub repository of UoM) have slight variation in their names: One with abbreviation mentioned and one without mentioning abbreviation at the end. I have amended UoM source, and this should not appear in the next import if we have a plan to do it before the release. I was not able to find this dataset in the healthdatagateway and the link to this is external (i.e. https://www.the-health-improvement-network.com/)
I hope this will help. In case if you have any further query, please let me know.
@shahzadmumtaz22 we still still think there is an issue with the below: Cardiovascular code list - 2ndary data under primary Pneumonia - references snomed instead of UK biobank Ethnic status - table still incorrect for ethnicity coding
Just wanted to check the process of updating the data sources on the Phenotype Library with datasets registered on the Gateway. I note the following discrepancies, but I am unable to help fix:
Missing Gateway Dataset URLs: • ID: 3 - Civil Registration - Deaths • ID: 8 - CPRD Aurum (Same as ID: 6) • ID: 17 - SMR01 • ID: 16 - HES APC • ID: 19 - Office for National Statistics - Death Registration Data? • ID: 7 - CPRD GOLD Uncertain Datasets • ID: 27 - Clinical Practice Research Datalink - There is no CPRD dataset. This should be either CPRD GOLD or AURUM • ID: 19 - Primary Care - Suggest to remove • ID: 24 - QResearch? Which dataset? • ID: 25 & ID: 14 - THIN are the same data source