Closed almosnow closed 2 months ago
Every distinctsOTU
in palmDB
is identified by a unique number: such as u16
. This is not always helpful for humans to read/remember so I implemented a nickname
system which provides random colorful nicknames to each sOTU such that it's more memorable.
The format of the palmprint nickname is <adjective><Noun>
. Each Adjective / Noun is one word long, written in camel-case as adjNoun
. The list of English Adjectives and Nouns was taken from WordNet (https://wordnet.princeton.edu/), the data.noun
and data.adj
files. The parsed list of Adjectives and Nouns is attached here:
wordnet.adjNoun.zip
sOTU Names
wiki page (Artem)sOTU.nickname.list
file for palmDBTo do this, extract all currently assigned nickName
from palmdb2
and dump them to a 3-column, ordered tsv
file (order by sOTU in increasing number).
sOTU.nickname.list
example
nickid sotu nickname
1 u16 skyKing
...
x u47468 unposedSave
...
n u301630 phallicUpdate
n+1 NA raisedCurrency
...
10000000 NA pickledCuticle
This file should be populated using adj
Noun
list upto ~10 million unique (non-repeating) nicknames. Unassigned nicknames will be designated NA
in the sotu
column. This file
will be used to create the palmdb
table nickname
column. No need to create a table from this file, we can simply update the file as we update palmdb
versions.
Found several duplicate nicknames assigned on the current palmdb2 table.
SELECT COUNT(nickname) FROM palmdb2 WHERE centroid = true LIMIT 8;
513,176
SELECT COUNT(DISTINCT(nickname)) FROM palmdb2 WHERE centroid = true LIMIT 8;
512,261
So, about a thousand, not huge.
How to proceed?
Suggestion, re-assign new nicknames to those.
Done,
File is half a GB so I won't upload it here, but it's ready to be used at some point.
Github https://github.com/serratus-bio/logan-backend/commit/edd9d782a34a2ec28202cc716ba59c147a9f2c9a
Get adverbs list/code from @ababaian