bioinfo-ibms-pumc / SCSA

SCSA: cell type annotation for single-cell RNA-seq data
GNU General Public License v3.0
79 stars 15 forks source link

annotate with cellranger file #7

Open phoebee-h opened 3 years ago

phoebee-h commented 3 years ago

Hi, I wonder how the cellranger_pbmc_3k.csv was generated in your instruction? It does not seem to be included in the subdirectory of CellRanger pipeline (v3.1.0/ v4.0.0)? What does "Weight" means in that column?

python3 SCSA.py -d whole.db -i cellranger_pbmc_3k.csv -k All -g Human -p 0.01 -f 1.5 -m txt -o sc.txt

Your file looks like each of the clusters are with 2 columns ("Weight" and "UMI counts/cell"):

image

But mine in the outs/analysis/diffexp/graphcluster/differential_expression.csv looks like this:

image

Thank you.

bioinfo-ibms-pumc commented 3 years ago

Sorry for delay. The test file uploaded was generated by previous version of Cellranger. Your file should be OK because SCSA could handle the result of Cellranger v3. Please let me know if you could not get the cell type result.

phoebee-h commented 3 years ago

Hi, Thanks for the response! Unfortunately, it does not work for the data frame as mentioned above. python3 SCSA.py -d whole.db -i SCSA_E18_mouse_globally.csv -k All -g Human -p 0.01 -f 1.5 -m txt -o scsc_E18.txt

Version V1.1 [2020/07/03]
DB load: 47347 3 3 48257 37440
Namespace(Gensymbol=False, MarkerDB=None, celltype='normal', cluster='all', db='whole.db', foldch               ange=1.5, input='SCSA_E18_mouse_globally.csv', list_tissue=False, noprint=False, norefdb=False, o               utfmt='txt', output='scsc_E18.txt', pvalue=0.01, source='cellranger', species='Human', target='ce               llmarker', tissue='All', weight=100.0)
Version V1.1 [2020/07/03]
DB load: 47347 3 3 48257 37440
load markers: 45409
############################## Cluster 1 ##############################

Cluster 1 Weight column not in the input table!

So, it looks like there's something I should modify to the "Weight column"?

bioinfo-ibms-pumc commented 3 years ago

Sorry for inconvenience. The result generated from Cellranger is mainly in three versions: v1, v2 and v3. Each version has different headlines, so please attach your headline like "differential_expression.csv" you mentioned above. Then I could find a solution. Thanks a lot!

phoebee-h commented 3 years ago

No, thanks for the development.

python3 ../SCSA/SCSA.py -d ../SCSA/whole.db -i differential_expression.csv -k All -g Human -p 0.01 -f 1.5 -m txt -o sc.txt

Version V1.1 [2020/07/03]
DB load: 47347 3 3 48257 37440
Namespace(Gensymbol=False, MarkerDB=None, celltype='normal', cluster='all', db='../SCSA/whole.db', foldchange=1.5, input='differential_expression.csv', list_tissue=False, noprint=False, norefdb=False, outfmt='txt', output='sc.txt', pvalue=0.01, source='cellranger', species='Human', target='cellmarker', tissue='All', weight=100.0)
Version V1.1 [2020/07/03]
DB load: 47347 3 3 48257 37440
load markers: 45409
############################## Cluster 1 ##############################

Cluster 1 Weight column not in the input table!

The "differential_expression.csv" is as attached, but the format was changed to ".txt" which could be uploaded here. differential_expression.txt

Headlines are like this:

FeatureID | FeatureName | Cluster 1 Average | Cluster 1 Log2 Fold Change | Cluster 1 P-Value | Cluster 2 Average | Cluster 2 Log2 Fold Change | Cluster 2 P-Value | Cluster 3 Average | Cluster 3 Log2 Fold Change | Cluster 3 P-Value | Cluster 4 Average | Cluster 4 Log2 Fold Change | Cluster 4 P-Value | Cluster 5 Average | Cluster 5 Log2 Fold Change | Cluster 5 P-Value | Cluster 6 Average | Cluster 6 Log2 Fold Change | Cluster 6 P-Value | Cluster 7 Average | Cluster 7 Log2 Fold Change | Cluster 7 P-Value

Thank you.

bioinfo-ibms-pumc commented 3 years ago

differential_expression.txt Hi, phoebee-h. Please try this file I just modified. The differences are : 1) Change the tsv format to csv format. 2) Replace the header strictly as "Feature ID", "Log2 fold change" etc. This is because various headlines may be existed caused by different kinds of cellranger versions and user's management. So, SCSA will not guess the headlines although it may be with similar letters.

phoebee-h commented 3 years ago

Thanks. It works.

So for headlines in DEG table (.csv) from cellranger output files, the following are required? Or what else the rules are?

  1. "Feature ID" (with space) instead of "FeatureID"
  2. "Feature Name" (with space) instead of "FeatureName"
  3. "Adjusted p value" instead of "P-Value"

Thank you again!