Teichlab / cellphonedb

MIT License
338 stars 105 forks source link

Invalid Counts data #303

Closed howardchen0810 closed 3 years ago

howardchen0810 commented 3 years ago

Hi Here is my count data. Becasue it is a big file, here is the link: https://drive.google.com/file/d/12fr_OYZSX4i-yLLcyc0C8UmT0vAoDKSl/view?usp=sharing here is my meta data. celltype_tumor.txt

Here is the code I ran: cellphonedb method analysis celltype.txt raw_data.txt --counts-data=gene_name

Could you help me to see what's wrong with my files? Thanks Howard

prete commented 3 years ago

Hi @howardchen0810 looks like the cells in your meta (celltype_tumor.txt) don't match your columns in your counts (raw_data.txt). Non of your count columns ends with _1, _2 or _3 like your meta Cells. Could you double check that?

howardchen0810 commented 3 years ago

Hi, Thank you for your reply. From my count data file, I don't think there are cells ending with _1, _2 or _3. I just checked. Could you double check for me? I appreciate your help.

prete commented 3 years ago

That's the issue! The cells on your meta don't match the columns on count, and they are supposed to. image

howardchen0810 commented 3 years ago

oh! Sorry. I attached the wrong file. celltype.txt

This is the one I used. Could you run this one and check with me?

image

Here is the code and results I have.

Thanks!

prete commented 3 years ago

@howardchen0810 You keep changing files and commands. It's hard to follow. Before I start to look at this again, can you confirm which version of CellPhoneDB and pandas are you using? pip show cellphonedb pandas

howardchen0810 commented 3 years ago

Here is the version I used. image

prete commented 3 years ago

I took both files you provided (celltype.txt and raw_data.txt) and ran cellphonedb with this command:

cellphonedb method statistical_analysis celltype.txt raw_data.txt --counts-data=gene_name

It took 20min and 18GB RAM but it didn't failed. Are you sure you're using the same dataset you shared?

howardchen0810 commented 3 years ago

I am 100% sure about using the same files I provided to you. It's probably because it ran too much RAM.

prete commented 3 years ago

I'm not sure. That error probably comes from this chunk of code. But the files you shared shouldn't have a problem with those validations. I could ask for your numpy version, in case that counts.astype(np.float) is failing, but that's a long shot. For example, you could try running this and see if you get an error:

import pandas as pd
import numpy as np
counts = pd.read_csv("raw_data.txt", sep="\t")
counts = counts.astype(np.float)
howardchen0810 commented 3 years ago

Hi, I just finished the run with requesting more memory on server and it works!!! I have another question when I run the dot_plot. There is always have the error pop out about
Registered S3 methods overwritten by 'ggplot2': method from [.quosures rlang c.quosures rlang print.quosures rlang

Do you know how to fix it? Thanks

prete commented 3 years ago

@howardchen0810 That's caused by having multiple packages with the same named functions. Should be more of a warning than an actual error. The last loaded package (ggplot2) is taking precedence for the function name and it's probably what you want when plotting.

Does that message prevents you from getting the plot output (i.e.: plot.pdf)?

prete commented 3 years ago

Plotting Registered S3 methods overwritten warning is not related to "invalid counts" issue. @howardchen0810 if that prevents you from getting your plot, please open a new ticket.