Closed mdondrup closed 9 months ago
Thank you for pointing out this issue, it looks like pandas >2.0 breaks the version of plotnine (which tries to generate the plots) that nPhase uses. The fastQ file generation happens after the plot generation since some datasets might be so large there's an out of memory error, and I wanted people to at least have some plots if that happens to know if it's worth running again with more memory
Can you try to downgrade pandas to 1.5.3 and run nPhase again on the test dataset in https://github.com/OmarOakheart/nPhase/tree/master/example ? It should run very quickly since it's a small dataset
Thank you for the reply. I installed pandas 1.5.3 and now I am getting a different error using the example data:
Phased files can be found at nphase_example_out/Example1/Phased
The *_variants.tsv file contains information on the consensus heterozygous variants present in each predicted haplotig.
The *_clusterReadNames.tsv file contains information on the reads which comprise each cluster.
/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/ggplot.py:727: PlotnineWarning: Saving 18 x 10 in image.
/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/ggplot.py:730: PlotnineWarning: Filename: nphase_example_out/Example1/Phased/Plots/Example1_0.1_0.01_0.05_0_phasedVis.svg
Traceback (most recent call last):
File "/home/ubuntu/micromamba/envs/polyploidPhasing/bin/nphase", line 11, in <module>
sys.exit(main())
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/bin/nPhasePipeline.py", line 587, in main
nPhasePipeline(args)
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/bin/nPhasePipeline.py", line 194, in nPhasePipeline
nPhaseFunctions.generatePhasingVis(simpleOutPath,datavisPath)
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/bin/nPhasePipelineFunctions.py", line 594, in generatePhasingVis
ggsave(g,filename=outputSVG,width=18,height=10)
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/ggplot.py", line 761, in ggsave
return plot.save(*arg, **kwargs)
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/ggplot.py", line 750, in save
raise err
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/ggplot.py", line 747, in save
_save()
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/ggplot.py", line 734, in _save
fig = figure[0] = self.draw()
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/ggplot.py", line 181, in draw
return self._draw(return_ggplot)
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/ggplot.py", line 188, in _draw
self._build()
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/ggplot.py", line 284, in _build
layout.setup(layers, self)
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/facets/layout.py", line 64, in setup
layer.data = self.facet.map(ldata, self.layout)
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/facets/facet_wrap.py", line 136, in map
keys = join_keys(facet_vals, layout, self.vars)
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/plotnine/utils.py", line 372, in join_keys
joint = pd.concat([x[by], pd.DataFrame([y[by]])], ignore_index=True)
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/pandas/core/frame.py", line 762, in __init__
mgr = ndarray_to_mgr(
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 329, in ndarray_to_mgr
values = _prep_ndarraylike(values, copy=copy_on_sanitize)
File "/home/ubuntu/micromamba/envs/polyploidPhasing/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 583, in _prep_ndarraylike
raise ValueError(f"Must pass 2-d input. shape={values.shape}")
ValueError: Must pass 2-d input. shape=(1, 2, 1)
I received a different error report today from someone who pointed out that conda takes forever to run. In order to resolve that issue I recomment using mamba https://github.com/conda-forge/miniforge instead of conda to create the environment. I then checked if that worked, and it seems that some of the instructions in the README were outdated. I think the matplotlib version I previously recommended downgrading to may have caused your issues.
I believe that if you use mamba and create a fresh environment, the test data should run flawlessly. I apologize for the inconvenience, please let me know if you have any trouble after that. So, to be clear,
micromamba create -n polyploidPhasing -c oakheart nphase -c bioconda
micromamba activate polyploidPhasing
Should now be sufficient to install nPhase, no need to downgrade pandas or matplotlib or anything else
I created a fresh environment by the commands above. We are using the setup from the Biostar handbook, therefore we are already using micromamba. This installs:
ModuleNotFoundError: No module named 'matplotlib._contour'
(compile-time error)
I then tested the other combinations:
ValueError: Must pass 2-d input. shape=(1, 2, 1)
(Runtime error following clustering)
ValueError: Must pass 2-d input. shape=(1, 2, 1)
ModuleNotFoundError: No module named 'matplotlib._contour'
Dear @OmarOakheart I suspect there is some version conflict of python packages somewhere. Could you do the following from within a python environment where the test is working:
import pkg_resources
installed_packages = pkg_resources.working_set
installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
for i in installed_packages])
print(installed_packages_list)
['backports.zoneinfo==0.2.1', 'certifi==2023.11.17', 'contourpy==1.1.1', 'cycler==0.12.1', 'descartes==1.1.0',
'fonttools==4.46.0', 'importlib-resources==6.1.1', 'kiwisolver==1.4.5', 'matplotlib==3.7.3', 'mizani==0.9.3', 'nphase==1.2.0',
'numpy==1.24.4', 'olefile==0.47', 'packaging==23.2', 'pandas==2.0.3', 'patsy==0.5.4', 'pillow==8.2.0', 'pip==23.3.1',
'plotnine==0.7.1', 'pyparsing==3.1.1', 'pyqt5-sip==4.19.18', 'pyqt5==5.12.3', 'pyqtchart==5.12', 'pyqtwebengine==5.12.1',
'python-dateutil==2.8.2', 'pytz==2023.3.post1', 'scipy==1.9.3', 'setuptools==68.2.2', 'six==1.16.0',
'sortedcontainers==2.4.0', 'statsmodels==0.14.0', 'tornado==6.3.3', 'tzdata==2023.3', 'unicodedata2==15.1.0',
'wheel==0.42.0', 'zipp==3.17.0']
Follow up here: I tried to install using PIP instead and that worked. I created a conda environment for the dependencies:
micromamba create -n nphasegit python=3.8 bwa gatk=4.3 samtools=1.9 ngmlr
micromamba activate nphasegit
pip install -U nPhase
The packages with version differences are:
It might help to include or update the version requirements in the conda package according to these.
Best regards Michael
['backports.zoneinfo==0.2.1', 'contourpy==1.1.1', 'cycler==0.12.1', 'fonttools==4.46.0', 'importlib-resources==6.1.1', 'kiwisolver==1.4.5', 'matplotlib==3.7.4', 'mizani==0.9.3', 'nphase==1.2.0', 'numpy==1.24.4', 'packaging==23.2', 'pandas==2.0.3', 'patsy==0.5.4', 'pillow==10.1.0', 'pip==23.3.1', 'plotnine==0.12.4', 'pyparsing==3.1.1', 'python-dateutil==2.8.2', 'pytz==2023.3.post1', 'scipy==1.10.1', 'setuptools==68.2.2', 'six==1.16.0', 'sortedcontainers==2.4.0', 'statsmodels==0.14.0', 'tzdata==2023.3', 'wheel==0.42.0', 'zipp==3.17.0']
Hello,
Sorry for the delay, you were right, there was a package incompatibility caused by nPhase 1.2.0 requiring a specific version of plotnine which is no longer necessary to retain frozen. I was able to replicate your issue (previously when installing with micromamba I didn't realize it installed a previous version of nPhase)
I've uploaded nPhase 1.2.1 which does not have this requirement, and it can now install properly. I've also tested that it produces plots and fastQ files on my new installation.
I think that should resolve the issue fully. You can make sure you're installing the correct version by running
micromamba create -n polyploidPhasing -c oakheart nphase=1.2.1 -c bioconda
Thank you for the fix. I have tested it on the example data and I can confirma that plots and phased fastq files are produced with few minor warnings:
/home/ubuntu/micromamba/envs/polyploidPhasing-2/lib/python3.8/site-packages/plotnine/themes/themeable.py:1902: FutureWarning: You no longer need to use subplots_adjust to make space for the legend or text around the panels. This paramater will be removed in a future version. You can still use 'plot_margin' 'panel_spacing' for your other spacing needs.
Thank you for pointing out the warning, glad the issue was successfully resolved. Don't hesitate to contact me for any assistance needed in running nPhase on your data
Best, Omar
I am trying to run nPhase on long read and short read data from tetraploid yeast strain. However, after running nphase pipeline or algorithm, no plots are generated in Phased/Plots and no files in Phased/FastQ while .tsv files are generated.
Command:
There is no error message in the log file, but an error is printed on STDERR. Here is the output of the run:
nPhase was installed using conda in Ubuntu as to the instructions.
Python and pandas version:
Linux ubuntu-compute-2 5.-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux