cellatlas / mx

6 stars 1 forks source link

Questions about mx normalize and assign #4

Closed BKover99 closed 5 days ago

BKover99 commented 6 days ago

Hey, first of all thanks for developing this package. I have two quick questions regarding mx normalize and mx assign.

1. mx normalize

In https://www.biorxiv.org/content/10.1101/2024.03.23.586413v1.full you write

“We chose the PFlog1pPF normalization (Sina Booeshaghi et al. 2022) as it effectively removes the mean-variance relationship while preserving depth normalization (Figure 1d,e,f).”

While in https://www.biorxiv.org/content/10.1101/2024.03.23.586412v1.full you write:

“Gene count matrices were normalized using ‘mx norm’, which uses the log1pPF method (Sina Booeshaghi et al. 2022).”

I am sure there is a misunderstanding on my end, but wouldn’t PFlog1pPF and log1pPF be different approaches, and which one did you use for the CCA?

Additionally, if one were to use PFlog1pPF, am I correct in that this would look like the following in mx:

mxnorm_command1 = f"mx normalize -o {output_mat} -m log1pPF {output_mat}"
!{mxnorm_command1}
mxnorm_command2 = f"mx normalize -o {output_mat} -m PF {output_mat}"
!{mxnorm_command2}

2. mx assign

I was able to generate the required ec files and performed the following command (on a small-ish dataset of about 1500 cells) in google colab+ (50gb RAM instance):

mx_assign_command = f"mx assign -g {marker_folder}/groups.txt -gi {genes} -bi {barcodes_out} -e {marker_folder}/markers.ec -o {outputfile}  {output_mat}"
!{mx_assign_command}

This always ran for about 30 seconds, after which it stopped with an output of “^C”, suggesting that it was stopped. Have you experienced this issue within colab?

Thanks for the help in advance!

sbooeshaghi commented 6 days ago

Hi,

The command you wrote should generate the PFlog1pPF matrix. For the CCA paper we chose to stick with log1pPF to be consistent with the standard normalization performed in most of the papers from which we downloaded the data. The reason for this is that marker genes were derived from log1pPF matrices in those papers.

Seperately, I've not experienced the issue you describe with mx assign- can you verify the size of the matrix and that it is in a .mtx format? Try printing the head of the file

head matrix.mtx
BKover99 commented 5 days ago

Hey so in the end I realised what was the problem. I didn’t run the mx_extract command to obtain the submatrix to use for mx assign. Now it all works. Thanks for the help!

sbooeshaghi commented 5 days ago

Awesome! Glad it worked. Thank you for putting this issue up.