Open ialzuru opened 5 years ago
Current parquet files mention the hash of the dataset that a record came from but not the specific file and row number. If a dataset contains two copies of the same record, it is not easy to distinguish them consistently.
https://github.com/bio-linker/organization/wiki/2019-08-23-Work-Session-Notes
Current parquet files mention the hash of the dataset that a record came from but not the specific file and row number. If a dataset contains two copies of the same record, it is not easy to distinguish them consistently.
https://github.com/bio-linker/organization/wiki/2019-08-23-Work-Session-Notes