Infeasible solution for EMD but inputs are in the simplex?

alsuhr-c commented 4 years ago

Describe the bug A call to the EMD solver is raising a UserWarning about the problem being infeasible (Check that a and b are in the simplex). I dug into the code and this seems to be raised only when a value in one of the two inputs is less than zero. However, I am printing the sum, max, and min of the inputs and don't see any odd behavior (sum to one, max values not greater than 1, min values not less than zero; inputs are the same size; type is np.float64).

Could this be an issue with underflow?

To Reproduce Steps to reproduce the behavior: using python3.7:

import ot
import torch
import random
import numpy as np

# Get distributions
random_dist = torch.softmax(torch.tensor([random.random() * 1000 for _ in range(625)]), dim=0).numpy()
random_dist2 = torch.softmax(torch.tensor([random.random() * 1000 for _ in range(625)]), dim=0).numpy()
print('--- First dist ---')
print('Shape: %s' % random_dist.shape)
print('Sum: %s' % np.sum(random_dist))
print('Max: %s' % random_dist.max())
print('Min: %s' % random_dist.min())
print('--- Second dist ---')
print('Shape: %s' % random_dist2.shape)
print('Sum: %s' % np.sum(random_dist2))
print('Max: %s' % random_dist2.max())
print('Min: %s' % random_dist2.min())

# Create distance matrix
coords = np.reshape(np.asarray(np.meshgrid(np.linspace(0, 24, 25), np.linspace(0, 24, 25)))[[1, 0], :, :].transpose((1, 2, 0)), [-1, 2])
tiled = np.tile(coords[:, np.newaxis, :], [1, coords.shape[0], 1])
ttiled = tiled.transpose((1, 0, 2))
distances = np.array(np.linalg.norm(tiled - ttiled, axis=2), order='C')

# Compute EMD
ot.emd(random_dist, random_dist2, distances)

Example output:

--- First dist ---
Shape: 625
Sum: 1.0
Max: 0.42820615
Min: 0.0
--- Second dist ---
Shape: 625
Sum: 1.0
Max: 0.96281725
Min: 0.0
/home/.../venv/lib/python3.7/site-packages/ot/lp/__init__.py:113: UserWarning: Problem infeasible. Check that a and b are in the simplex
  result_code_string = check_result(result_code)

Expected behavior I would expect that this error is not thrown as all of the values in both distributions are nonnegative.

Screenshots N/A, minimal working example above

Desktop (please complete the following information):

OS: I can reproduce it on Linux (Ubuntu 16.04.1) and Mac OS (10.14.6)
Python version: 3.7.0 on Linux; 3.7.5 on Mac
- How was POT installed: pip and conda (two different machines)

Output of the following code snippet:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import ot; print("POT", ot.__version__)

Linux:

Linux-4.15.0-65-generic-x86_64-with-Ubuntu-16.04-xenial
Python 3.7.0 (default, Aug  6 2018, 20:07:46) 
[GCC 5.4.0 20160609]
NumPy 1.17.3
SciPy 1.4.1
POT 0.6.0

Mac:

Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.5 (default, Oct 25 2019, 10:52:18) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.18.1
SciPy 1.4.1
POT 0.6.0

Additional context The scale for random.random() can be as low as 10 and the bug will happen. If the scale is 1, it can find a solution. Sounds like a numerical underflow problem to me.

Thanks!

alsuhr-c commented 4 years ago

By the way, looks related to #93 which does not have a resolution. I also am able to reproduce this on both Linux and Mac OS.

rflamary commented 4 years ago

Hello @alsuhr-c , thank you for the bug report

I believe the problem comes from the fact that on the simplex in float32 is sometimes not on the simplex in float64 (up to numerical precision). In other words if you do

random_dist/=random_dist.sum()

on the numpy float64 vectors it should work. Could you print the float for the sum with 16 digits '%1.16f') so that we can confirm the error please?

I have seen the same probem in my code when going from pytorch to numpy to solve emd. I think we should provide also pytorch compatible functioin that can handle this kind if problems in POT but i never find the time to do a clean PR .

alsuhr-c commented 4 years ago

I inserted the lines:

random_dist /= random_dist.sum()
random_dist2 /= random_dist2.sum()

between the softmaxes and printing the distributions, and printed using '%1.64f'. I also reduced the size of the distributions from 625 to 4 (which also requires changing the distance matrix computation). However, I am still seeing the same behavior (on both the Linux and Mac machines:

--- First dist ---
Shape: 4
Sum: 1.0000000000000000000000000000000000000000000000000000000000000000
Max: 0.7423889636993408203125000000000000000000000000000000000000000000
Min: 0.0029366279486566781997680664062500000000000000000000000000000000
--- Second dist ---
Shape: 4
Sum: 1.0000000000000000000000000000000000000000000000000000000000000000
Max: 0.9031170606613159179687500000000000000000000000000000000000000000
Min: 0.0035151618067175149917602539062500000000000000000000000000000000
/Users/.../lib/python3.7/site-packages/ot/lp/__init__.py:113: UserWarning: Problem infeasible. Check that a and b are in the simplex
  result_code_string = check_result(result_code)

I attached the minimal working example as a text file (Github doesn't allow py attachments I guess) if you want to try it: test_emd.txt

alsuhr-c commented 4 years ago

One work-around I am using now instead is to set all values less than a certain amount (has to be high, though) to zero, e.g.,

dist = np.where(dist > 0.001, dist, np.zeros(dist.shape)).

This will return a feasible solution, but the threshold must be relatively high (e.g., like above, 0.001, which is not great if my probability distribution has 625 elements where the uniform probability is 1/625 = 0.0016).

I also realized that INFEASIBLE might not just be set when values are below zero (Github search wasn't working). https://github.com/rflamary/POT/blob/master/ot/lp/network_simplex_simple.h also is setting it internally. I'm not sure how to use a debugger with cython though to test this code.

rflamary commented 4 years ago

@alsuhr-c I think I found the problem!

The .numpy() function in pytorch return float32 numpy arrays. If i convert them manually with .astype(np.float64) there is no problems anymore. In fact the function emd does the conversion from you but as i said after conversion the array is not on the simplex for float64.

import ot
import torch
import random
import numpy as np

# Get distributions
random_dist = torch.softmax(torch.tensor([random.random() * 10 for _ in range(4)]), dim=0).numpy().astype(np.float64)
random_dist2 = torch.softmax(torch.tensor([random.random() * 10 for _ in range(4)]), dim=0).numpy().astype(np.float64)

random_dist /= random_dist.sum()
random_dist2 /= random_dist2.sum()

print('--- First dist ---')
print('Shape: %s' % random_dist.shape)
print('Sum: %1.64f' % np.sum(random_dist))
print('Max: %1.64f' % random_dist.max())
print('Min: %1.64f' % random_dist.min())
print('--- Second dist ---')
print('Shape: %s' % random_dist2.shape)
print('Sum: %1.64f' % np.sum(random_dist2))
print('Max: %1.64f' % random_dist2.max())
print('Min: %1.64f' % random_dist2.min())

# Create distance matrix
coords = np.reshape(np.asarray(np.meshgrid(np.linspace(0, 1, 2), np.linspace(0, 1, 2)))[[1, 0], :, :].transpose((1, 2, 0)), [-1, 2])
tiled = np.tile(coords[:, np.newaxis, :], [1, coords.shape[0], 1])
ttiled = tiled.transpose((1, 0, 2))
distances = np.array(np.linalg.norm(tiled - ttiled, axis=2), order='C')

# Compute EMD
ot.emd(random_dist, random_dist2, distances)

Now it should work. We definitely have to tell people that teh conversion to float6 i happening with a warning in the documentation. You are welcome to propose a PR if you want.

alsuhr-c commented 4 years ago

Interesting. This works for the Linux machine, but still a problem for Mac (I tested it by fixing the random seed). For me, this is not a huge deal as I run experiments on Linux. But I noticed issues saying people are running into the problem with Mac OS.

I can try to create a PR. Specifically, raising a UserWarning when either of the two distributions is not of type np.float64 when emd is called? (Basically, requiring they be of type np.float64 when called.)

Also, what do you think about raising an error, rather than a warning, when a problem is infeasible? I'm not super familiar with what is going on behind the scenes in the EMD computation, but is it ever possible that the problem is infeasible when the inputs are on the simplex? If not, then I think raising an error would be a lot safer than raising a warning.

Warnings are only shown once to the user, and the result of the call to emd is in the expected format (all zeroes) when the problem is infeasible. I know personally that I and others are using EMD to compute the distance between two distributions, so we use the cost returned by emd (when log=True). This cost is simply zero when the problem is marked as infeasible. Because the warning is raised only once (so it's easily ignored), and the return value looks correct, this can be really dangerous. (Right now, I am using the warnings package to catch the warnings from POT as if they are errors.)

Another option, instead of actually raising an error, is to have the return value of emd be None or something when the warning is thrown, so that the calling code doesn't treat the return value like it's the actual solution.

rflamary commented 4 years ago

@ncourty could you have a look at the code above on Mac OS? This is indeed very weird.

ncourty commented 4 years ago

I confirm that the above code is rising the 'Problem infeasible' UserWarning without the .astype(np.float64) and is not anymore rising it when adding .astype(np.float64), as suggested in your code @rflamary.

My configuration is actually

Darwin-19.3.0-x86_64-i386-64bit Python 3.7.6 (default, Jan 8 2020, 13:42:34) [Clang 4.0.1 (tags/RELEASE_401/final)] NumPy 1.18.1 SciPy 1.4.1 POT 0.6.0

alsuhr-c commented 4 years ago

Ah, never mind, I was misreading where the error was coming from on the Mac! (I was trying resetting tiny values to zero and then running EMD, and this unsurprisingly resulted in an infeasible solution). The solution of converting to float64 and then dividing by the sum works on Mac.

Any thoughts on changing how the EMD solver reports this problem to the user? I think raising an exception would be safest. The user could catch such an exception if they want and ignore it, but there wouldn't be a return value that looks like a solution.

ncourty commented 4 years ago

hello @alsuhr-c , yes i definitely agree that raising an exception might is safest. If you feel like making a PR, go ahead ! Otherwise we'll try to take some time to implement it for the next release.

korawat-tanwisuth commented 2 years ago

I am wondering if the bug has been fixed. As @alsuhr-c suggested, throwing an exception would be better. The current implementation will just return a zero matrix for the transport plan. This is very dangerous as it looks like nothing goes wrong.

rflamary commented 2 years ago

yes it has and it will e in the next release very soon.

in the meantime you can install from the master branch in git that is very stable

PythonOT / POT

Infeasible solution for EMD but inputs are in the simplex? #126