aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
188 stars 29 forks source link

An incorrect usage of a regular expression caused the quality control of PBMC tutorials to fail #283

Open JohnWang1997 opened 10 months ago

JohnWang1997 commented 10 months ago

Dear authors and developers, Thanks for all your work.

Describe the bug When I tried to reproduce the quality control steps for scATAC-seq data following the official PBMC tutorial, I encountered an error where no cells could pass the quality control steps, as reported in issues #146 and #168. I attempted the respected author's suggestion to downgrade to pandas version 1.5.0, but it did not resolve the issue.

I compared consensus_peaks with annot and noticed that the format for the chromatin names in consensus_peaks is 'chr' followed by the chromosome serial number, while the format in annot, imported from ensemble.org, is just the chromosome serial number. The authors reconciled these two formats by prepending the string "chr" to each string in the chromosome name column. However, due to different versions of Pandas handling string replacement with regular expressions differently, the code in the official tutorial (annot['Chromosome/scaffold name'] = annot['Chromosome/scaffold name'].str.replace(r'(\b\S)', r'chr\1')) does not work with the currently recommended pandas==1.5, as it does not use regular expressions for string replacement by default, resulting in a matching failure. This issue can be resolved by adding the regex=True parameter.

To Reproduce The original code

annot['Chromosome/scaffold name'] = annot['Chromosome/scaffold name'].str.replace(r'(\b\S)', r'chr\1')

The replaced code

annot['Chromosome/scaffold name'] = annot['Chromosome/scaffold name'].str.replace(r'^(?!chr)(.*)', r'chr\1', regex=True)

Now the quality control can work normally and output the same results as the tutorial. qc