Open lisch7 opened 10 months ago
Moreover, why not enroll Celltypist and BBKNN into OV? I think some of your previous excellent strategies for batch removal and annotation on Wechat Official Accounts can also be integrated into the OV process.
Thank you for your suggestion, here is the response to your suggestion:
adata.layers['counts']
. So you can use adata.layers['counts'] to get the raw values if you need to.adata.layers['scaled']
for the calculation of pca only.Thanks again for your suggestions. Zehua
Thanks for your kind reply, but it seems that I didn't express what I meant exactly. I appreciate the opportunity to clarify my suggestions:
Flexibility in ov.pp.preprocess
.
I noticed that in the preprocessing step (ov.pp.preprocess
), certain parameters are set by default, such as in the snippet from lines 372-379 in _preprocess.py
:
sc.pp.normalize_total(
adata,
target_sum=target_sum,
exclude_highly_expressed=True,
max_fraction=0.2,
)
Here, exclude_highly_expressed=True
is automatically applied. In scanpy, the default of exclude_highly_expressed
is False
(https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.normalize_total.html#scanpy.pp.normalize_total). It may be more appropriate to set it to True
based on your experience, but I think it may be more appropriate to stay consistent with the classic tutorial. Thus, my suggestion is to allow users to adjust this setting, perhaps through an additional parameter in ov.pp.preprocess
. This could provide more control over the preprocessing based on specific data characteristics.
Expanding regress
Functionality.
In the regress
function (lines 437-452 in _preprocess.py
), the parameters mito_perc
and nUMIs
are fixed targets for regression. I propose enhancing this function's flexibility by allowing users to specify which parameters to regress. This flexibility could be crucial for analyses where other variables might be more relevant.
Concerns with regress_and_scale
.
Regarding the regress_and_scale
function, particularly at line 471 in _preprocess.py
, I wonder if the code adata_mock = scale(adata_mock)
could be modified to adata_mock = sc.pp.scale(adata_mock)
. As someone relatively new to Python, I'm not sure if this change would be more appropriate or efficient, and would appreciate your insight on this.
Looking forward to your thoughts on this.
Thanks for your kind reply, but it seems that I didn't express what I meant exactly. I appreciate the opportunity to clarify my suggestions:
- Flexibility in
ov.pp.preprocess
. I noticed that in the preprocessing step (ov.pp.preprocess
), certain parameters are set by default, such as in the snippet from lines 372-379 in_preprocess.py
:sc.pp.normalize_total( adata, target_sum=target_sum, exclude_highly_expressed=True, max_fraction=0.2, )
Here,
exclude_highly_expressed=True
is automatically applied. In scanpy, the default ofexclude_highly_expressed
isFalse
(https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.normalize_total.html#scanpy.pp.normalize_total). It may be more appropriate to set it toTrue
based on your experience, but I think it may be more appropriate to stay consistent with the classic tutorial. Thus, my suggestion is to allow users to adjust this setting, perhaps through an additional parameter inov.pp.preprocess
. This could provide more control over the preprocessing based on specific data characteristics.
- Expanding
regress
Functionality. In theregress
function (lines 437-452 in_preprocess.py
), the parametersmito_perc
andnUMIs
are fixed targets for regression. I propose enhancing this function's flexibility by allowing users to specify which parameters to regress. This flexibility could be crucial for analyses where other variables might be more relevant.- Concerns with
regress_and_scale
. Regarding theregress_and_scale
function, particularly at line 471 in_preprocess.py
, I wonder if the codeadata_mock = scale(adata_mock)
could be modified toadata_mock = sc.pp.scale(adata_mock)
. As someone relatively new to Python, I'm not sure if this change would be more appropriate or efficient, and would appreciate your insight on this.Looking forward to your thoughts on this.
Thanks for your advice, we will add more parameter in next version.
Zehua
Follow the issue, I had one question want to ask. Do the authors had any abvises for the huge dataset to save the RAM memory when used the OV to do the analysis? Not sure what kind statistics in the pp.reprocess, it take lot of RAM memory, can use gc.collect to release the memory?
Additionaly, there are some mistakes in the tutorials (https://omicverse.readthedocs.io/en/latest/Tutorials-single/t_single_batch/).
An example:
And similar problem occurs in other calibration batches of tutorials.
Follow the issue, I had one question want to ask. Do the authors had any abvises for the huge dataset to save the RAM memory when used the OV to do the analysis? Not sure what kind statistics in the pp.reprocess, it take lot of RAM memory, can use gc.collect to release the memory?
If you want to save on RAM expenses, then you might consider setting argument backed='r'
when reading h5ad files using sc.read
or ov.read
Follow the issue, I had one question want to ask. Do the authors had any abvises for the huge dataset to save the RAM memory when used the OV to do the analysis? Not sure what kind statistics in the pp.reprocess, it take lot of RAM memory, can use gc.collect to release the memory?
If you want to save on RAM expenses, then you might consider setting argument
backed='r'
when reading h5ad files usingsc.read
orov.read
backed='r'
can help save the memory of pp.preprocessing? I remember that only speed up to reading the files. Whatever, will try.
There is bugs in function ov.pp.regress_and_scale
I think that the re-calculation of scaled|original|X_pca
should be removed, as the funtion batch_correction
dose not ask us to input 'log-transformed counts' or 'scaled data', the later input will lead to double scaled.
Hi @Starlitnightly, I noticed that recent tutorials haven't covered the usage of ov.utils.store_layers
and ov.utils.retrieve_layers
. I believe these are useful functions for saving and retrieving layer counts. Are there any more efficient tools available for this process?
Hi,
Firstly, I'd like to commend the OV project for its contributions to scRNA analysis. I have a few suggestions that could potentially enhance its utility:
Flexibility in ov.pp.preprocess: This function integrates several key processing steps. However, some steps like robust gene identification and gene filtering are mandatory. It might be beneficial to offer more control here. For instance, adding a parameter such as
robust_gene=True, threshold=0.05
could provide users with the option to toggle this feature. Similarly, the mandatory use ofsc.pp.normalize_total(..., exclude_highly_expressed=True...)
could be made optional with a control parameter.Expanding regress Functionality: Currently, the regress function seems limited to specific parameters like mito_perc and nUMIs. It would be advantageous to allow regression on other variables as per user requirements.
Concerns with regress_and_scale: In the current implementation, I'm wondering if replacing adata = sc.pp.regress_out(adata, ['mito_perc', 'nUMIs']) with
adata_mock = sc.pp.scale(adata_mock)
at line 471 might be more appropriate. This change could potentially improve the function's performance or accuracy.I believe these enhancements could make OV even more flexible and user-friendly for diverse scRNA analysis scenarios. Looking forward to your thoughts on this.