ARM-DOE / pyart

The Python-ARM Radar Toolkit. A data model driven interactive toolkit for working with weather radar data.
https://arm-doe.github.io/pyart/
Other
517 stars 268 forks source link

BUG: Error when using CyLP #1022

Open tanelv opened 3 years ago

tanelv commented 3 years ago

After finally managing to successfully install CyLP, using it in phase_proc_lp (pyart.correct.phase_proc_lp(radar, 2.0, self_const = 12000.0, low_z=0.0, high_z=53.0, min_phidp=0.01, min_ncp=0.3, min_rhv=0.8, LP_solver='cylp_mp', proc=15)) does not work. The error seems to be "Error in `python': free(): invalid pointer: 0x00005597c77d6c98"

A long list of messages and memory map is being printed out: cylp_messages.txt And then the script just hangs.

I installed CyLP following these instructions https://github.com/coin-or/CyLP

I tried also installing CyLP following these instructions provided in the Py-ART documentation https://arm-doe.github.io/pyart/setting_up_an_environment.html but unsuccessfully. I got what looked like compiling issues even after installing additional conda compilers. So the original CyLP installation instructions worked, but for some reason the phase_proc_lp function is not working still.

zssherman commented 3 years ago

Hmmm, haven't seen that error before. How large are the files you are trying to process? What OS are you using? I'll try to install using their methods to see if I can reproduce, I usually use the pip install of the python branch of jjhelmus as it seemed to usually be more stable, but confused why the compilers wouldn't have helped.

zssherman commented 3 years ago

Also, while I try to see if I can reproduce etc, I recommend possibly opening on issue on their issue tracker as well. Maybe someone else has experienced the issue as well there.

kmuehlbauer commented 3 years ago

@tanelv I'm wondering why there are two different environments involved (cbc and wradlib_xr) in the traceback? If things get picked up from another environment this is usually a big source of problems.

If you can provide any additional details, this would help very much for diagnosing.

zssherman commented 3 years ago

Ah good catch @kmuehlbauer ! Yeah I second that as well.

tanelv commented 3 years ago

Hm, good points. The files are IRIS RAW files, around 5-10 MB each. I use CentOS 7:

(cbc) [a93859@stage63 ~]$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

The files should be OK, I used the phase_proc_lp function with CyLP on the same files on an older server (running on Scientific Linux 6.9) successfully, but our university is moving to the new HPC system and I would need to migrate to this new system.

Yes, there are two environments. I first tried to get CyLP working under the wradlib_xr environment (where there are both wradlib and pyart installations), but as I could not get it working there I decided to try to make a new environment only for pyart (the cbc env). Actually I managed to get to the same point in wradlib_xr env. CyLP installation finally succeeded, but the script hangs with the same error. When running the script in wradlib_xr env, the traceback does not refer to the other env. But if the two environments still might cause troubles, should I delete both and make a new environment and try to install there?

kmuehlbauer commented 3 years ago

But if the two environments still might cause troubles, should I delete both and make a new environment and try to install there?

Just to be on the safe side. It might not solve the issue, but we would know for sure then.

tanelv commented 3 years ago

Sorry for taking so long to answer. I tried to first update the current Anaconda installation, but as it stayed solving the environment for more than 6 hours I stopped it, removed Anaconda completely and installed a new Anaconda from scratch using the current latest version (https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh). These are the steps I took:

wget https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh
bash Anaconda3-2021.11-Linux-x86_64.sh
conda create -n pyart_py38 python=3.8 arm_pyart coin-or-cbc numba gdal -c conda-forge
conda activate pyart_py38
conda install -c conda-forge pkg-config
pip install cylp

To install CyLP I followed these instructions https://github.com/coin-or/CyLP

And it still hangs with the same error as before (*** Error in `python': free(): invalid pointer: 0x000055a16c1efc68 ***): cylp_log2.txt (in the log you can also see all the steps I took starting from creating the new environment)

zssherman commented 2 years ago

Sorry for the late response, I'm wondering if trying to older version would work. I'm not familiar with the new CyLP version, so seeing what I can find out about it, but maybe install coincbc with: conda install -c conda-forge coincbc then for cylp: pip install git+https://github.com/jjhelmus/CyLP.git@py3 has worked for our package cmac. I would still raise an issue on the CyLP issue tracker as well.

tanelv commented 2 years ago

Thanks for the suggestion. To be safe I removed the old pyart_py38 env and created a new one with the same name. I tried to install using the above suggestions with jjhelmus version, but I get compilation errors, no matter which compiler I use error: command '/gpfs/space/home/a93859/anaconda3/envs/pyart_py38/bin/x86_64-conda_cos6-linux-gnu-gcc' failed with exit status 1 Full error log here cylp_compile_error_log.txt I now also raised an issue on the CyLP issue tracker https://github.com/coin-or/CyLP/issues/136

tkralphs commented 2 years ago

The fork https://github.com/jjhelmus/CyLP/tree/py3 of CyLP has been merged into master (see coin-or/CyLP#28) and other things have been fixed since then, so I doubt that rolling back to that version will help. Also, coincbc now just installs coin-or-cbc (see conda-forge/coin-or-cbc-feedstock#11), so that also shouldn't make a difference. If you can replicate this in stand-alone CyLP (or even better in stand-alone Cbc), I can take a look, but there's not enough information in coin-or/CyLP#136 to even start to debug.

zssherman commented 2 years ago

Thanks @tkralphs for the response! Yeah makes sense, I'll keep trying to see if I can reproduce the error. @tanelv Are you able to share one of the files that your using?

tanelv commented 2 years ago

Here is one of the files SUR190511130002.zip (IRIS raw)

kmuehlbauer commented 2 years ago

@zssherman It would be great if we could join forces on this one. I'm interested in getting this working too.

zssherman commented 2 years ago

@kmuehlbauer Awesome, yeah that sounds like a great idea to me! I haven't been able to reproduce the specific error yet, but the code is hanging up on these files. So been digging through the code to see. Also have tried not using the multi processing version of the code to try to isolate the problem.

kmuehlbauer commented 2 years ago

@zssherman my idea is to start from the last working environment, if we could identify such. Then we could increase versions and see which one breaks. Ideally we would set this up using CI in a dedicated branch in our pyart forks. I'll try to get something running, but this might take some time.

zssherman commented 2 years ago

@kmuehlbauer Gotcha sound good! So I did try the coincbc conda-forge install with the py3 branch in python3.6 just to try anything, and I was able to run the cylp code. When I updated python and cylp is when I started to hang and any file I tried including the user's file above. The py3 branch of cylp only works for python3.6. Python3.6 is far back, so not sure how useful, but between then and now is when something changed. Whether the current kdp proccesing code doesn't handle the current changes and needs to be updated or something else is causing memory issues. I'm trying to check the coin-or-cbc as well.

zssherman commented 2 years ago

@kmuehlbauer The environment i used was: conda create -n cylp_test -c conda-forge python=3.6 numpy netCDF4 coin-or-cbc scipy matplotlib cython gcc_linux-64 gxx_linux-64 with a development install of pyart and github install of the python 3 branch of cylp

tanelv commented 2 years ago

Thanks @zssherman for the Python 3.6 reference. I also managed to get CyLP installed in Python 3.6 and my script now runs as it should. These are the steps I took (I removed the old environment before that)

conda create -n pyart_py36 -c conda-forge python=3.6 numpy netCDF4 scipy matplotlib cython gcc_linux-64 gxx_linux-64 arm_pyart coincbc gdal
conda activate pyart_py36
pip install git+https://github.com/jjhelmus/CyLP.git@py3
kmuehlbauer commented 2 years ago

@zssherman Just FYI, I've recreated the Python 3.6 environment as suggested. It worked. I've created other environments for Python 3.7 /3.8 and 3.9. It looked promising first, but now nothing works, even the Python 3.6 environment doesn't work. I have to restart from scratch.

I've found those interesting issues over at CyLP, which might be connected:

Also I found that we have to be careful with the cython version and we would need to recreate the cpp in any case.

zssherman commented 2 years ago

@kmuehlbauer Sorry for the late response, was on vacation. And makes sense, yeah that is helpful, thanks for finding those! Trying to think how to go about this next because it almost seems like a memory leak issue.

zssherman commented 2 years ago

As a side note, we will be having assistance on this soon and will most likely do an overhaul of the kdp processing code.

mgrover1 commented 2 years ago

So it looks like Google has an or-tools package that has the ability to access the same linear program solvers we use in cylp.

For example, check out this walkthrough of a mixed-integer programming problem... here is a list of the solvers available:

kmuehlbauer commented 2 years ago

@mgrover1 That's available from within conda-forge (ortools-python), too.

mgrover1 commented 2 years ago

@mgrover1 That's available from within conda-forge (ortools-python), too.

Awesome - yeah, it looks like they have a Simplex option, which is what is currently used...

scollis commented 2 years ago
Very excited to see this happening  From: Max Grover ***@***.***>Date: Monday, March 28, 2022 at 8:44 AMTo: ARM-DOE/pyart ***@***.***>Cc: Subscribed ***@***.***>Subject: Re: [ARM-DOE/pyart] Error when using CyLP (Issue #1022)So it looks like Google has an or-tools package that has the ability to access the same linear program solvers we use in cylp.For example, check out this walkthrough of a mixed-integer programming problem... here is a list of the solvers available:CLP_LINEAR_PROGRAMMING or CLPCBC_MIXED_INTEGER_PROGRAMMING or CBCGLOP_LINEAR_PROGRAMMING or GLOPBOP_INTEGER_PROGRAMMING or BOPSAT_INTEGER_PROGRAMMING or SAT or CP_SATSCIP_MIXED_INTEGER_PROGRAMMING or SCIPGUROBI_LINEAR_PROGRAMMING or GUROBI_LPGUROBI_MIXED_INTEGER_PROGRAMMING or GUROBI or GUROBI_MIPCPLEX_LINEAR_PROGRAMMING or CPLEX_LPCPLEX_MIXED_INTEGER_PROGRAMMING or CPLEX or CPLEX_MIPXPRESS_LINEAR_PROGRAMMING or XPRESS_LPXPRESS_MIXED_INTEGER_PROGRAMMING or XPRESS or XPRESS_MIPGLPK_LINEAR_PROGRAMMING or GLPK_LPGLPK_MIXED_INTEGER_PROGRAMMING or GLPK or GLPK_MIPThis package is pip installable, and works with the most recent Python versions—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
tkralphs commented 2 years ago

There is no shortage of Python interfaces to MIP solvers. @mkoeppe recently compiled a nice list of all the options, which it would probably be useful to have somewhere other than a ticket in Sage, but here it is:

https://trac.sagemath.org/ticket/26511#comment:56

I believe or-tools uses file I/O to pass instances to a stand-alone Cbc solver and also uses pure Python to build the model, so it's going to be much slower than CyLP. I'm not sure if speed is important for you but if not, then or-tools would probably serve your purpose. python-mip calls the Cbc library directly through cffi so it's passing the instance to Cbc in memory, but would still be slower than CyLP because it also builds the model in Python. I realize that you guys have struggled a lot with CyLP and it makes sense to look at alternatives, but just wanted to make you aware of the tradeoffs.

By the way, I'm not sure if you guys saw it, but @mkoeppe and I just finished some major improvements to CyLP and there are now binary wheels for all platforms, dramatically simplifying installation (no need to install Cbc, see here).

Whether you continue with CyLP or not, I'm still interested in tracking down this bug.

mgrover1 commented 2 years ago

@tkralphs thank for your response - as someone who is new to MIP solvers, I appreciate your insight on the Python MIP interfaces and your work on CyLP.

The main reason for looking into alternatives was the requirement to use Python 3.6, which was causing issues with installing the rest of the environment we use for PyART.

That is fantastic news about the improved installation steps! I just tried it out with a Python 3.9 environment, and it worked beautifully. Happy to provide feedback where we can, and again, thanks for all your work with CyLP.

tkralphs commented 2 years ago

Just to be clear, CyLP works with any version of Python. I am using it with Python 3.10. I think the Python 3.6 "requirement" came from the fact that installing it in 3.6 seemed to overcome the particular bug reported here for some reason, but I think the situation is not at all clear at this point. Some more digging is needed. If someone could try to replicate this issue with the new wheels, that would be helpful. Perhaps that will fix the bug somehow.

mgrover1 commented 2 years ago

Using the new build files, I am still seeing the following when running our example using CyLP

Processing Code:

import numpy as np
import matplotlib.pyplot as plt
import pyart
from pyart.testing import get_test_data

file = get_test_data('095636.mdv')

# perform LP phase processing (this takes a while)
radar = pyart.io.read_mdv(file)

# the next line force only the first sweep to be processed, this
# significantly speeds up the calculation but should be commented out
# in production so that the entire volume is processed
radar = radar.extract_sweeps([0])

phidp, kdp = pyart.correct.phase_proc_lp(radar, 0.0, debug=True)

Error:

Exec time:  0.5900969505310059
Doing  0
python(43345,0x117afc600) malloc: *** error for object 0x7f7fad9c6660: pointer being freed was not allocated
python(43345,0x117afc600) malloc: *** set a breakpoint in malloc_error_break to debug
mgrover1 commented 2 years ago

@tkralphs we are running into the same issue described in https://github.com/coin-or/CyLP/issues/138 I believe...

tkralphs commented 2 years ago

OK, thanks, I will try to find some time to build a version of CyLP and Cbc with debugging symbols, so that I can see exactly where this error is occurring.

mgrover1 commented 2 years ago

It looks like printing the array returned by CyLP works:

print(solution)
[[ 2.34766422  2.43061544  3.18696968 ... 34.95957546 34.96160985
  34.96285309]
...
 [ 0.66390759  0.88923667  1.24627207 ... 27.49033571 27.46688439
  27.41206319]
 [ 0.90856286  1.30960192  1.81392097 ... 32.86643274 32.86900836
  32.87058235]

It is a numpy.ndarray:

<class 'numpy.ndarray'>

and we can take the mean of this array

37.23997104342368

but when we assign some variable to this solution in the function phase_proc_lp, we run into the malloc error:

python(63892,0x10f9fa600) malloc: *** error for object 0x7fe6d3f50660: pointer being freed was not allocated
python(63892,0x10f9fa600) malloc: *** set a breakpoint in malloc_error_break to debug
mgrover1 commented 2 years ago

@tkralphs following up - have you had a chance to look at the build error here?

mole-bai commented 2 years ago

Thanks @zssherman for the Python 3.6 reference. I also managed to get CyLP installed in Python 3.6 and my script now runs as it should. These are the steps I took (I removed the old environment before that)

conda create -n pyart_py36 -c conda-forge python=3.6 numpy netCDF4 scipy matplotlib cython gcc_linux-64 gxx_linux-64 arm_pyart coincbc gdal
conda activate pyart_py36
pip install git+https://github.com/jjhelmus/CyLP.git@py3

@tanelv I have the same problem as you, but after following your steps to install CYLP, it reports an error when running the test code:

Processing Code: https://github.com/coin-or/CyLP#modeling-example

Error: undefined symbol:_ZN17CoinIndexedVectorD2Ev image

In addition to using pip install git+https://github.com/jjhelmus/CyLP.git@py3 to install cylp-0.7.4, whether you also do other configuration?

mole-bai commented 2 years ago

@zssherman I encountered the same problem and rolled back to CYLP-0.7.4 as follows:

conda create -n pyart_py36 -c conda-forge python=3.6 numpy netCDF4 scipy matplotlib cython gcc_linux-64 gxx_linux-64 arm_pyart coincbc gdal 
conda activate pyart_py36
pip install git+https://github.com/jjhelmus/CyLP.git@py3

but in cyLP-0.7.4, the most basic function imports reported an error : image

So if I want to run this function successfully now :

pyart.correct.phase_proc_lp(radar, 2.0, self_const = 12000.0, low_z=0.0, high_z=53.0, min_phidp=0.01, min_ncp=0.3, min_rhv=0.8, LP_solver='cylp_mp', proc=15)

How should I configure my CyLP and Pyart environments? Looking forward to your reply!

mgrover1 commented 2 years ago

@mole-bai - we are working on replacing the CyLP solver in Py-ART. You can use one of the other solvers (LP_solver = "pyglpk" or LP_solver = "cvxopt")

We apologize that we are not able to support solving this CyLP issue.