The CID file has mainly 1 CID/Synonym per row, in which the delimiter is a single space. Some rows though the CID/Synonym is divided by a Tab, in these cases the normalize script is not splitting these rows, because the delimiter we are using in line 50 doesn't detect the Tabs.
CID where the delimiter issue happens are :
76540
22628
@enicolasgomez The problem here happens with the way we are parsing the CID file in the normalize script. https://github.com/MetaCell/asu-olfactory/blob/feature/20/applications/pub-chem-index/tasks/ingestion/normalize.py#L50
The CID file has mainly 1 CID/Synonym per row, in which the delimiter is a single space. Some rows though the CID/Synonym is divided by a Tab, in these cases the normalize script is not splitting these rows, because the delimiter we are using in line 50 doesn't detect the Tabs.
CID where the delimiter issue happens are : 76540 22628