dangeles / TissueEnrichmentAnalysis

This repository holds the scripts for a tissue enrichment tool, which uses a hypergeometric test, apt for C. elegans use.
MIT License
6 stars 5 forks source link

Bug in pip installed version #8

Closed Munfred closed 3 years ago

Munfred commented 3 years ago

Hello, I'm using Python 3.7.4. I pip installed tea and tried to rest it on a csv with 3 genes and ran into this error. Not sure where the issue is.

Is there an API to query the web server directly or the only two options are to use the GUI or locally with pip?

[07:43:03] (base) [edaveiga@login2 sternberg]$ tea genes.csv mytitle tissue
Traceback (most recent call last):
  File "/home/edaveiga/anaconda3/bin/tea", line 104, in <module>
    alpha=q, show=False)
  File "/home/edaveiga/anaconda3/lib/python3.7/site-packages/tissue_enrichment_analysis/hypergeometricTests.py", line 1
78, in enrichment_analysis
    df_final.columns = ['Term', 'Expected']
  File "/home/edaveiga/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 5143, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 66, in pandas._libs.properties.AxisProperty.__set__
  File "/home/edaveiga/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 564, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/home/edaveiga/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 227, in set_axis
    f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements
Munfred commented 3 years ago

Good news: I was able to run the TEA tutorial notebook as provided on Colab (https://colab.research.google.com/gist/Munfred/38293959acd42294579a4d7084e8ef95/tutorial.ipynb)

That made me realize that I provided the csv file incorrectly. I had provided it like

WBGene00000067, WBGene00000086, WBGene00000135

But it should be one per line:

WBGene00000241
WBGene00000267 
WBGene00000086

Thanks!

dangeles commented 3 years ago

glad this could be solved! Let me know if I can help in any other way.

Munfred commented 3 years ago

Actually since you're here I wanna pick your brain on another question:

I’m generating the list of genes I give to TEA from a differential expression result on C elegans single cell RNA seq data, so I have both significantly enriched and significantly depleted genes. However for TEA I should only provide it with significantly enriched genes.

Would you have an idea if there's any simple way that I could use the list of significantly depleted ones to confirm the TEA results? I guess that would be a thing on it's own, Tissue Depletement Analysis hah

dangeles commented 3 years ago

I usually:

A) Passed the complete list to TEA B) Passed UP and DOWN genes separately to TEA

I could never decide which approach is better. My hypothesis is that tissues are as likely to upregulate as they are to down-regulate genes when responding to stimuli, so splitting decreases your power to detect tissues. But some people think that pathways are turned ON/OFF and they will cluster by tissue, so then splitting increases your power.

So I did both.

At some point, I thought about implementing a ranked gene test, but never got around to it. You'd be welcome to develop it though!

Munfred commented 3 years ago

Ah, I see. I think I'll go with providing the full list then, since then there's only one list of results to look at instead of two, and I'm doing this for every cluster formed in the single cell data, which means several dozens of times.

Do you have an example reference for the ranked gene test I could look at? I might try to implement it depending on how things go.

Thanks!