daler / metaseq

Framework for integrated analysis and plotting of ChIP/RIP/RNA/*-seq data
https://daler.github.io/metaseq
MIT License
87 stars 36 forks source link

genomic signal array #17

Closed kellermac closed 8 years ago

kellermac commented 9 years ago

I am having trouble implementing example 1. I have tried with the provided data as well as a data set I am interested in (bigWig). I am getting an error at the BamSignal.array step. The code follows:

if not os.path.exists('example.npz'): ... ip_array = ip_signal.array( ... tsses_1kb, ... bins=100, ... processes=processes) ... input_array = input_signal.array( ... tsses_1kb, ... bins=100, ... processes=processes) ... ip_array /= ip_signal.mapped_read_count() / 1e6 ... input_array /= input_signal.mapped_read_count() / 1e6 ... metaseq.persistence.save_features_and_arrays( ... features=tsses, ... arrays={'ip': ip_array, 'input': input_array}, ... prefix='example', ... link_features=True, ... overwrite=True) ... Traceback (most recent call last): File "", line 5, in File "/home/keller/miniconda/envs/metaseq-test/lib/python2.7/site-packages/metaseq/_genomic_signal.py", line 126, in array stacked_arrays = np.row_stack(arrays) File "/home/keller/miniconda/envs/metaseq-test/lib/python2.7/site-packages/numpy/core/shape_base.py", line 228, in vstack return _nx.concatenate([atleast_2d(_m) for _m in tup], 0) ValueError: need at least one array to concatenate

I tried to set up just one array at a time:

input_array = input_signal.array( ... tsses_1kb, ... bins=100, ... processes=processes) Traceback (most recent call last): File "", line 4, in File "/home/keller/miniconda/envs/metaseq-test/lib/python2.7/site-packages/metaseq/_genomic_signal.py", line 126, in array stacked_arrays = np.row_stack(arrays) File "/home/keller/miniconda/envs/metaseq-test/lib/python2.7/site-packages/numpy/core/shape_base.py", line 228, in vstack return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)

Hopefully you can point me to a solution. Thanks.

daler commented 9 years ago

Thanks for reporting this. I suspect this is due to some recent changes in bx-python and pysam (both dependencies of metaseq) recently, and I need to fix metaseq to support those changes. I'm currently working on this and will update here when things are fixed.

kellermac commented 9 years ago

Oh good. I was worried I was wasting your time. Thank you.

daler commented 9 years ago

The pysam issue actually causes a different error than you're reporting here. I added a small fix; can you try pulling the latest metaseq version from github and re-trying your code? The example 1 code works for this.

If it still fails, would you provide a minimal example that reproduces the error so I can use that for debugging?

Thanks.

kellermac commented 9 years ago

I updated metaseq with pip -install --upgrade metaseq. Now I get slightly different error reports.

if not os.path.exists('example.npz'): ... ip_array = ip_signal.array( ... tsses_1kb, ... chunksize=100, ... processes=processes) ... input_array = input_signal.array( ... tsses_1kb, ... chunksize=100, ... processes=processes) ... ip_array /= ip_signal.mapped_read_count() / 1e6 ... input_array /= input_signal.mapped_read_count() / 1e6 ... metaseq.persistence.save_features_and_arrays( ... features=tsses, ... arrays={'ip': ip_array, 'input': input_array}, ... prefix='example', ... link_features=True, ... overwrite=True) ... Traceback (most recent call last): File "", line 5, in File "/home/keller/miniconda/envs/metaseq-test/lib/python2.7/site-packages/metaseq/_genomic_signal.py", line 122, in array chunksize=chunksize, **kwargs) File "/home/keller/miniconda/envs/metaseq-test/lib/python2.7/site-packages/metaseq/array_helpers.py", line 383, in _array_parallel itertools.repeat(kwargs))) File "/home/keller/miniconda/envs/metaseq-test/lib/python2.7/multiprocessing/pool.py", line 251, in map return self.map_async(func, iterable, chunksize).get() File "/home/keller/miniconda/envs/metaseq-test/lib/python2.7/multiprocessing/pool.py", line 567, in get raise self._value TypeError: Expected bytes, got unicode

daler commented 9 years ago

OK, that's the pysam error. I haven't released another metaseq version to PyPI though, so you're not yet getting the fix.

To get the fix for now, download this zip file of the github repository. Unzip it, and in the unzipped directory run python setup.py install. Then try re-running your code.

kellermac commented 9 years ago

I updated like you said, but it seems metaseq lost some dependencies somehow?

import metaseq Traceback (most recent call last): File "", line 1, in File "/home/keller/miniconda/envs/metaseq-test/lib/python2.7/site-packages/metaseq-0.5.5.4-py2.7.egg/metaseq/init.py", line 12, in import tableprinter File "/home/keller/miniconda/envs/metaseq-test/lib/python2.7/site-packages/metaseq-0.5.5.4-py2.7.egg/metaseq/tableprinter.py", line 1, in import fisher ImportError: No module named fisher

daler commented 9 years ago

Sorry about that -- I'm still setting up the new test infrastructure so that wasn't caught.

I've updated the git repo, can you please try again?

kellermac commented 9 years ago

Do I need to update with pip install --upgrade metaseq? (tried that didn't look changed) is there a better way?

daler commented 9 years ago

Using the git repo and installing with python setup.py develop is the best way.

In case you're not familiar with git, some more detailed instructions . . .

First clone the repository:

git clone https://github.com/daler/metaseq.git

Then change to the directory and install metaseq in development mode:

cd metaseq
python setup.py develop

Then, any time you want to incorporate updates from github, from inside the metaseq directory do this:

git pull

The files will update, and since you've installed in development mode, Python will find the updated files so there's no need to re-install.

kellermac commented 9 years ago

the python 2.7 that is included in miniconda/bin doesn't have setuptools, so updating this way doesn't work. Is it OK to use a different installation of python? Or should I try to upgrade my python in that directory?

daler commented 9 years ago

Try conda install setuptools to get setuptools in that environment.

Sorry about this hassle . . . I'm porting everything over to bioconda so hopefully installation will be much smoother in the future.

kellermac commented 9 years ago

So I cloned metaseq into my directory /home/keller/metaseq (the conda command was correct). But I am getting a similar error.

import metaseq Traceback (most recent call last): File "", line 1, in File "/home/keller/metaseq/metaseq/init.py", line 12, in import tableprinter File "/home/keller/metaseq/metaseq/tableprinter.py", line 1, in import fisher

Should I instead have replaced metaseq inside of the miniconda directory (I'm not sure where it is located or I would have tried this).

daler commented 9 years ago

In the new cloned directory, did you run python setup.py develop? It should have seen the new entry for fisher in requirements.txt and installed that.

Though that should work, if it doesn't you can do

conda install pip  # to make sure when you call pip it installs into your anaconda dir
pip install fisher
daler commented 9 years ago

I pushed a new version, 0.5.6, to PyPI and bioconda. Now (and in the future) you can install everything using bioconda:

conda install --channel bioconda metaseq

and to update,

conda update --channel bioconda metaseq

See the much-simplified installation docs for details.

Does this solve your issue?

kellermac commented 9 years ago

It fixed many issues I was having, and installation was better. However I am having trouble generating figures with matplotlib. And something funny is going on with the ip/input arrays (the bins aren't 100). Perhaps you can tell if this is a bug or some error on my part.

assert arrays['ip'].shape == (5708, 100) == arrays['input'].shape Traceback (most recent call last): File "", line 1, in AssertionError arrays['ip'].shape (5708, 2001) arrays['input'].shape (5708, 2001) fig = plt.figure() Traceback (most recent call last): File "", line 1, in File "/home/keller/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py", line 527, in figure kwargs) File "/home/keller/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 46, in new_figure_manager return new_figure_manager_given_figure(num, thisFig) File "/home/keller/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 53, in new_figure_manager_given_figure canvas = FigureCanvasQTAgg(figure) File "/home/keller/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 76, in init FigureCanvasQT.init(self, figure) File "/home/keller/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py", line 68, in init** _create_qApp() File "/home/keller/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py", line 138, in _create_qApp raise RuntimeError('Invalid DISPLAY variable') RuntimeError: Invalid DISPLAY variable

daler commented 9 years ago

This is a matplotlib issue that occurs when it can't open a graphical window, often when running on a server without X11 installed.

See this section of the matplotlib docs for a solution, or see the answers from this SO question for more options.

As far as the array shape goes, is it possible you have a old saved *.npz file in your working directory that's getting loaded? If you're following the example and you have a example.npz file, try deleting it and then re-run the example.

kellermac commented 8 years ago

Thanks for the help! I am now able to complete the examples and print/save the images. One last question before we close this. When using metaseq's genomic_signal function I kept getting a syntax error when I made the object name anything other than ip_signal or input_signal. Are these the only valid object names? What if I want to compare >2 data sets? Here is the error messsage:

3hr_signal = 'metaseq.genomic_signal(os.path.join(data_dir, 'GSM1596239_2-3h_wt_MNase-seq_rep1.bw'),'bigWig')' File "", line 1 3hr_signal = 'metaseq.genomic_signal(os.path.join(data_dir, 'GSM1596239_2-3h_wt_MNase-seq_rep1.bw'),'bigWig')' ^ SyntaxError: invalid syntax

Thanks for the help, and the great work!

daler commented 8 years ago

Python does not allow variable names to start with numbers -- try changing your variable names to something like signal_3hr.

kellermac commented 8 years ago

Thank you.