Closed ljwh2 closed 1 year ago
The following should be changed to 'UKB':
'others' should be 'other'
'multiple cohorts' should be 'multiple'
The following should all be changed to 'RS'
'GenScot' should be 'GS:SFHS'
Any commas within the cohort field should be changed to pipes
Any forward slashes within the cohort field should be changed to pipes
Trailing spaces should be removed
'special characters' eg. 'Ôªø' should be removed
Note: Letter case should be disregarded when it comes to consolidating
'All New Diabetics In Scania (ANDIS)' should be 'ANDIS'
'Malmö Diet and Cancer (MDC)' should be 'MDC'
'Children's Hospital of Philadelphia (CHOP)' should be 'CHOP'
'Baependi Heart Study' should be 'Baependi'
The following should be changed to 'Estonia':
'EstBB' should be changed to 'EB'
'GenerationR' should be 'Generation_R'
The following should be 'GERA':
created dev ticket for this https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/goci/1111
Cohort data needs cleaning up before sharing with users, e.g. all instances of UKB/UKBB to be harmonised, RSI-III to be combined, special characters removed.
@earlEBI to add details here of what needs doing and @ljwh2 to follow up with developers.