EBISPOT / xgwas-curator-tasks

An internal repo for GWAS curators to track issues
0 stars 0 forks source link

Clean up cohort data #20

Closed ljwh2 closed 1 year ago

ljwh2 commented 1 year ago

Cohort data needs cleaning up before sharing with users, e.g. all instances of UKB/UKBB to be harmonised, RSI-III to be combined, special characters removed.

@earlEBI to add details here of what needs doing and @ljwh2 to follow up with developers.

earlEBI commented 1 year ago

The following should be changed to 'UKB':

'others' should be 'other'

'multiple cohorts' should be 'multiple'

The following should all be changed to 'RS'

'GenScot' should be 'GS:SFHS'

Any commas within the cohort field should be changed to pipes

Any forward slashes within the cohort field should be changed to pipes

Trailing spaces should be removed

'special characters' eg. 'Ôªø' should be removed

Note: Letter case should be disregarded when it comes to consolidating

earlEBI commented 1 year ago

Part 2 - smaller issues:

'All New Diabetics In Scania (ANDIS)' should be 'ANDIS'

'Malmö Diet and Cancer (MDC)' should be 'MDC'

'Children's Hospital of Philadelphia (CHOP)' should be 'CHOP'

'Baependi Heart Study' should be 'Baependi'

The following should be changed to 'Estonia':

'EstBB' should be changed to 'EB'

'GenerationR' should be 'Generation_R'

The following should be 'GERA':

ljwh2 commented 1 year ago

created dev ticket for this https://app.zenhub.com/workspaces/gwas-59df823c4a6feb3786810391/issues/gh/ebispot/goci/1111