The contraction_info function in quimb returned UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 30337: surrogates not allowed in my circuit (a simple circuit with 40000-qubit entanglement). I traced the code and it led me to the contract_path function in opt_einsum.
I found that the get_symbol function in parser.py might generate this error. Since '\ud800' is a surrogate (explained in https://www.informit.com/articles/article.aspx?p=2274038&seqNum=10), when we try to print input_subscripts in contract_path, the error is raised. This error can probably be solved by returning the surrogate's index.
get_symbol(55156)
#> '�'
However, I'm not sure if this solution might raise more problems, since this returned string is not a single character. If returning a string with string length > 1 is feasible, I would like to submit a pull request. If not, maybe we can skip the surrogates in get_symbol.
The
contraction_info
function inquimb
returnedUnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 30337: surrogates not allowed
in my circuit (a simple circuit with 40000-qubit entanglement). I traced the code and it led me to thecontract_path
function inopt_einsum
. I found that theget_symbol
function inparser.py
might generate this error. Since '\ud800' is a surrogate (explained in https://www.informit.com/articles/article.aspx?p=2274038&seqNum=10), when we try to printinput_subscripts
incontract_path
, the error is raised. This error can probably be solved by returning the surrogate's index.However, I'm not sure if this solution might raise more problems, since this returned string is not a single character. If returning a string with string length > 1 is feasible, I would like to submit a pull request. If not, maybe we can skip the surrogates in
get_symbol
.