csbl-br / wikiora

Flask app for gene over-representation analysis based on Wikidata.
https://wikiora.sysbio.tools
MIT License
23 stars 2 forks source link

feat: Randomize also the way genes are separated when clicking the randomize button (, ; | tab \n ) #26

Closed lubianat closed 1 month ago

lubianat commented 1 month ago

@jvfe By the way, I am reopening this for the future as a super minor

but maybe it is not optimal to randomize all with the same frequency ? not sure. the ";"-separated is quite ugly lol

Also, the initial randomize was adding a space after the "," , maybe the space should be kept? It doesn't affect the processing, just visual stuff

jvfe commented 1 month ago

@jvfe By the way, I am reopening this for the future as a super minor

but maybe it is not optimal to randomize all with the same frequency ? not sure. the ";"-separated is quite ugly lol

Also, the initial randomize was adding a space after the "," , maybe the space should be kept? It doesn't affect the processing, just visual stuff

Sure! I can do that once I have the time (which tbh is pretty short as of late 😆 ) Should be simple to do, just using https://docs.python.org/dev/library/random.html#random.choices and providing separate weights for each element. E.g. giving "," a higher weight would lead to it being chosen more often.

Does the space not matter with any separator? I.e. "GeneA; GeneB"? Cause if so that's a simple change as well.

lubianat commented 1 month ago

The function for parsing the list is stupidly straightforward:

def parse_gene_list(gene_list):
    genes = re.split(r"[\s,;]+", gene_list.strip())
    return [gene for gene in genes if gene]

So I guess it should work with any separator.