df2regulons splitting gene names by character

Hello, I have produced a regulons dataframe from the following code:

reg = pd.read_csv(REGULONS_FNAME, sep=',',index_col=False,header=1)
reg = reg.rename(columns={'Unnamed: 0': 'TF','Unnamed: 1':'MotifID'})
reg = reg.drop(0)
reg = reg.set_index(['TF','MotifID'])
reg.head()

Which represents expected results from the tutorial. However, when I run df2regulons() prior to running aucell and observe the list that should (to my understanding) contain the regulon name, list of genes and their weights, and other data from the initial data frame; the gene names seem oddly incorrect.

Regulon(name='Alx3(+)', gene2weight=frozendict.frozendict({'[': 1.0, '(': 1.0, "'": 1.0, 'P': 1.0, 'l': 1.0, 's': 1.0, 'c': 1.0, 'r': 1.0, '4': 1.0, ',': 1.0, ' ': 1.0, '0': 1.0, '.': 1.0, '7': 1.0, '9': 1.0, '6': 1.0, '8': 1.0, '5': 1.0, '1': 1.0, '2': 1.0, '3': 1.0, ')': 1.0, 'C': 1.0, 'p': 1.0, 'a': 1.0, 'I': 1.0, 'n': 1.0, 'M': 1.0, 'e': 1.0, 't': 1.0, 'F': 1.0, 'o': 1.0, 'x': 1.0, 'd': 1.0, 'R': 1.0, 'D': 1.0, 'i': 1.0, 'L': 1.0, 'T': 1.0, 'b': 1.0, 'E': 1.0, 'h': 1.0, 'N': 1.0, 'g': 1.0, 'f': 1.0, 'm': 1.0, 'K': 1.0, ']': 1.0, 'v': 1.0, 'S': 1.0, 'A': 1.0, 'k': 1.0, 'O': 1.0, 'G': 1.0, 'j': 1.0, 'U': 1.0, 'J': 1.0, 'W': 1.0, 'V': 1.0, 'w': 1.0, 'B': 1.0, 'H': 1.0, 'u': 1.0, 'z': 1.0}), gene2occurrence=frozendict.frozendict({}), transcription_factor='Alx3', context=frozenset({'metacluster_9.26.png', 'activating'}), score=3.343534473214638, nes=0.0, orthologous_identity=0.0, similarity_qvalue=0.0, annotation='')

Referring to #505 , the issue dealt with dtype conversion to string for several fields in the dataframe. This does not seem to be the case for this error, however:

print(reg.dtypes)

AUC                      float64
NES                      float64
MotifSimilarityQvalue    float64
OrthologousIdentity      float64
Annotation                object
Context                   object
TargetGenes               object
RankAtMax                float64
dtype: object

They appear to be splitting the string of each gene by character. I believe this is causing my aucell matrix to contain only zeros, as its reading incorrect gene names. Is there a work around/solution to this? Thanks.

aertslab / pySCENIC

df2regulons splitting gene names by character #509