OpenMined / Threepio

A multi-language library for translating commands between PyTorch, TensorFlow, and TensorFlow.js
Other
56 stars 15 forks source link

Incorrect mapping of pytorch functions having underscores. #143

Open hafenkran opened 3 years ago

hafenkran commented 3 years ago

Description

Yesterday I discovered a possible bug in the lookup of pytorch functions while doing some of the examples from your PySyft repository (Part 8). I was able to track down the error to this library. The problem is about the mapping of pytorch functions with underscores (e.g. log_softmax).

By executing the code from "How to Reproduce", I get the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-66-70a2e7eff419> in <module>()
     17 
     18 # translate the command
---> 19 translated_cmd = threepio.translate(cmd, lookup_command=True)
     20 translated_cmd[0].execute_routine()

/usr/local/lib/python3.6/dist-packages/pythreepio/threepio.py in translate(self, cmd, lookup_command)
    106     def translate(self, cmd: Command, lookup_command: bool = False) -> List[Command]:
    107         from_info = self.commands[self.from_lang][
--> 108             self._normalize_func_name(cmd.function_name, self.from_lang)
    109         ]
    110         if len(from_info) > 1:

KeyError: 'logsoftmax'

The problem here is that the "logsoftmax" function cannot be found in the dictionary while I was looking for "log_softmax". However, there exists an entry for "log_softmax" in the lookup dictionary (I debugged into self.command).

There are two possible reasons for this. First, it is intended that the _normalize_func_name() method always removes underscores by applying a regex.sub("", func_name) with r"[^a-zA-Z]" as the regex. In this case, the pythreepio/static/mapped_commands_full.json - which is used to populate the self.commands dictionary - is wrong. I was also able to track down some commit that introduced the underscores issue into the mapped_commands_full.json (formerly named mapped_commands.json - see here: https://github.com/OpenMined/Threepio/commit/ad3bc159d23afa6d22784c65edc83935c40f3b1c) Second, it is now wanted to support functions with underscores (like the new mapped_commands_full.json is listing). Then, however, the _normalize_func_name() method needs to be adjusted by, for instance, applying a regex like r"[^a-zA-Z_]", because otherwise commands having underscores (e.g. log_softmax) will not get mapped.

What would be the correct way to fix this?

How to Reproduce

Either making use of Part 8 in the PySyft example section or by executing the following example:

import tensorflow as tf
import torch.nn.functional

from pythreepio.threepio import Threepio
from pythreepio.command import Command

threepio = Threepio("torch", "tf", tf)

args = [
       tf.constant([[1., 0.], [0., 1.]]),
       tf.constant([[0., 1.], [1., 0.]])
]

kwargs = {}
# corresponding to torch.nn.functional.log_softmax
cmd = Command("log_softmax", args, kwargs)

# translate the command
translated_cmd = threepio.translate(cmd, lookup_command=True)
translated_cmd[0].execute_routine()

Expected Behavior

Functions that have an underscore get mapped correctly.

Screenshots

-

System Information

I think it doesn't matter.

Additional Context

Add any other context about the problem here.