User-created genesets currently have long random primary _id strings such as f78lQ4MBbFFuJZ6h9lGW, which are generated by default by Elasticsearch.
We could improve the readability and memorability of these ids by generating custom 5 or 6 character strings using, for example, Base62 characters.
Base 62 uniqueness metrics
5 chars in base 62 will give you 62^5 unique IDs = 916,132,832 (~1 billion) At 10k IDs per day you will be ok for 91k+ days
6 chars in base 62 will give you 62^6 unique IDs = 56,800,235,584 (56+ billion) At 10k IDs per day you will be ok for 5+ million days
Additionally, we might want to prepend a CURIE-style string to the beginning of our user-created IDs, this would allow us to easily differentiate the _ids of user-created genesets on sight.
A proposed string to use as a prefix for these ids is: mygst:. Using base62 strings of length 6, a typical _id would look like: mygst:UPrGAT
User-created genesets currently have long random primary
_id
strings such asf78lQ4MBbFFuJZ6h9lGW
, which are generated by default by Elasticsearch.We could improve the readability and memorability of these ids by generating custom 5 or 6 character strings using, for example, Base62 characters.
Additionally, we might want to prepend a CURIE-style string to the beginning of our user-created IDs, this would allow us to easily differentiate the _ids of user-created genesets on sight.
A proposed string to use as a prefix for these ids is:
mygst:
. Using base62 strings of length 6, a typical _id would look like:mygst:UPrGAT