Closed j2moreno closed 4 years ago
Fields extracting in json to get merged snps:
Some snpids have no merged_into
field and are not included when creating Rsmerge flat file:
2020-03-16 15:41:48 svc-3024-5-8.rc.usf.edu DEBUG(1): rs748938867 in file 12 has no merge info!
{'citations': [],
'create_date': '2015-04-1T22:25Z',
'dbsnp1_merges': [],
'last_update_build_id': '152',
'last_update_date': '2018-10-12T18:51Z',
'lost_obs_movements': [{'allele_in_cur_release': {'deleted_sequence': 'AAAT',
'inserted_sequence': 'AAAT',
'position': 21541580,
'seq_id': 'NC_000014.9'},
'allele_in_prev_release': {'deleted_sequence': 'AAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAA',
'inserted_sequence': 'AAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAA',
'position': 22009718,
'seq_id': 'NC_000014.8'},
'component_ids': [{'type': 'subsnp',
'value': '1710625764'},
{'type': 'subsnp',
'value': '1710625767'}],
'observation': {'deleted_sequence': 'AAAT',
'inserted_sequence': 'AAAT',
'position': 22009726,
'seq_id': 'NC_000014.8'},
'rsids_in_cur_release': ['71419142']},
{'allele_in_cur_release': {'deleted_sequence': 'AAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAA',
'inserted_sequence': 'AAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAA',
'position': 21541576,
'seq_id': 'NC_000014.9'},
'allele_in_prev_release': {'deleted_sequence': 'AAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAA',
'inserted_sequence': 'AAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAA',
'position': 22009718,
'seq_id': 'NC_000014.8'},
'component_ids': [{'type': 'subsnp',
'value': '1710625764'},
{'type': 'subsnp',
'value': '1710625767'}],
'observation': {'deleted_sequence': 'AAAT',
'inserted_sequence': '',
'position': 22009726,
'seq_id': 'NC_000014.8'},
'rsids_in_cur_release': ['71419142']}],
'merged_snapshot_data': {'merged_into': [],
'proxy_build_id': '152',
'proxy_time': '2018-10-12T18:51Z'},
'present_obs_movements': [],
'refsnp_id': '748938867'} ```
Because of the size of Rsmerge, https://ftp.ncbi.nih.gov/snp/latest_release/JSON/refsnp-merged.json.bz2 was split into 32 files using snptk-split before processing using snptk-parse-rsmerge-json.py
https://ftp.ncbi.nih.gov/snp/latest_release/JSON/refsnp-merged.json.bz2