SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
188
stars
29
forks
source link
An incorrect usage of a regular expression caused the quality control of PBMC tutorials to fail #283
Dear authors and developers,
Thanks for all your work.
Describe the bug
When I tried to reproduce the quality control steps for scATAC-seq data following the official PBMC tutorial, I encountered an error where no cells could pass the quality control steps, as reported in issues #146 and #168. I attempted the respected author's suggestion to downgrade to pandas version 1.5.0, but it did not resolve the issue.
I compared consensus_peaks with annot and noticed that the format for the chromatin names in consensus_peaks is 'chr' followed by the chromosome serial number, while the format in annot, imported from ensemble.org, is just the chromosome serial number. The authors reconciled these two formats by prepending the string "chr" to each string in the chromosome name column. However, due to different versions of Pandas handling string replacement with regular expressions differently, the code in the official tutorial (annot['Chromosome/scaffold name'] = annot['Chromosome/scaffold name'].str.replace(r'(\b\S)', r'chr\1')) does not work with the currently recommended pandas==1.5, as it does not use regular expressions for string replacement by default, resulting in a matching failure. This issue can be resolved by adding the regex=True parameter.
Dear authors and developers, Thanks for all your work.
Describe the bug When I tried to reproduce the quality control steps for scATAC-seq data following the official PBMC tutorial, I encountered an error where no cells could pass the quality control steps, as reported in issues #146 and #168. I attempted the respected author's suggestion to downgrade to pandas version 1.5.0, but it did not resolve the issue.
I compared consensus_peaks with annot and noticed that the format for the chromatin names in consensus_peaks is 'chr' followed by the chromosome serial number, while the format in annot, imported from ensemble.org, is just the chromosome serial number. The authors reconciled these two formats by prepending the string "chr" to each string in the chromosome name column. However, due to different versions of Pandas handling string replacement with regular expressions differently, the code in the official tutorial (
annot['Chromosome/scaffold name'] = annot['Chromosome/scaffold name'].str.replace(r'(\b\S)', r'chr\1')
) does not work with the currently recommended pandas==1.5, as it does not use regular expressions for string replacement by default, resulting in a matching failure. This issue can be resolved by adding theregex=True
parameter.To Reproduce The original code
The replaced code
Now the quality control can work normally and output the same results as the tutorial.