cancerDHC / umls-rrf-scala

A very basic library for parsing files in the UMLS RRF format
MIT License
4 stars 2 forks source link

Optimizations and bug fixes #13

Closed gaurav closed 3 years ago

gaurav commented 3 years ago

This PR improves UMLS-RRF-Scala in two ways:

  1. The Filler module now prepares the output as a LazyList, rather than mapping the entire list in one go.
  2. It standardizes the cache time period to None (i.e. cache indefinitely), rather than the overly short two seconds we were using earlier.

It also fixes two minor bugs:

  1. We were previously not reading the --fill-predicate-id command line option correctly: if it was surrounded by single quotes, those would be included as part of the predicate ID, so it wouldn't match correctly. This PR changes that behavior so that (1) comma-separated values are can be used to provide multiple predicate IDs, and (2) each predicate ID is checked to see if it has an initial and final single quote; if so, both are eliminated.
  2. We were using SSSOM: prefixed CURIEs for some Fillers, while others used the full URI (http://purl.org/sssom/type/). This PR standardizes that so we only use the full URI.