WormBase / wormicloud

Interaction data analysis with word clouds
2 stars 0 forks source link

Read gene names from postgres pap_gene table instead of Textpresso categories #38

Open valearna opened 3 years ago

valearna commented 3 years ago

Textpresso categories queries are too slow, and pap_gene already contains a lot of gene names. Future automated extraction pipelines should make pap_gene contain gene names matching those extracted by tpc

valearna commented 3 years ago

For now we can continue to use the list of genes from textpresso, but in case of papers with a high number of genes (#genes in paper / total # c. elegans genes), we could remove genes mentioned only once. This would take care of high throughput experiments. Reading genes from postgres would still be faster, but we need to wait for a pipeline that is able to extract genes from full text and not only abstracts.