UnicodeEncodeError in contract_path

The contraction_info function in quimb returned UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 30337: surrogates not allowed in my circuit (a simple circuit with 40000-qubit entanglement). I traced the code and it led me to the contract_path function in opt_einsum. I found that the get_symbol function in parser.py might generate this error. Since '\ud800' is a surrogate (explained in https://www.informit.com/articles/article.aspx?p=2274038&seqNum=10), when we try to print input_subscripts in contract_path, the error is raised. This error can probably be solved by returning the surrogate's index.

get_symbol(55156)
#> '&#55296;'

However, I'm not sure if this solution might raise more problems, since this returned string is not a single character. If returning a string with string length > 1 is feasible, I would like to submit a pull request. If not, maybe we can skip the surrogates in get_symbol.

dgasmith / opt_einsum

UnicodeEncodeError in contract_path #182