gaolabtools / scNanoGPS

Single cell Nanopore sequencing data for Genotype and Phenotype
Other
39 stars 2 forks source link

Problem with --min_gene_no option of reporter_expression.py #37

Closed donginLee37 closed 2 months ago

donginLee37 commented 5 months ago

Hi Cheng-kai,

Thanks for considering my past questions, and appreciate in advance also for this time. While I was testing a '--min_gene_no' option of the reporter_expression.py, I encountered some problems, and would like to ask you whether you can help me.

I ran the below command to test the --min_gene_no option:

python3 ~/shared/scNanoGPS/reporter_expression.py -t 24 --gtf ~/shared/scNanoGPS/genes.gtf --featurecounts $(which featureCounts) \
-d ${working}/${sample}/ngene_test/ --min_gene_no 1 -o 1_matrix.tsv --log 1_reporter_expression.log.txt --sel_bc_o 1_filtered_barcode_list.txt

python3 ~/shared/scNanoGPS/reporter_expression.py -t 24 --gtf ~/shared/scNanoGPS/genes.gtf --featurecounts $(which featureCounts) \
-d ${working}/${sample}/ngene_test/ --min_gene_no 10 -o 10_matrix.tsv --log 10_reporter_expression.log.txt --sel_bc_o 10_filtered_barcode_list.txt

python3 ~/shared/scNanoGPS/reporter_expression.py -t 24 --gtf ~/shared/scNanoGPS/genes.gtf --featurecounts $(which featureCounts) \
-d ${working}/${sample}/ngene_test/ --min_gene_no 50 -o 50_matrix.tsv --log 50_reporter_expression.log.txt --sel_bc_o 50_filtered_barcode_list.txt

python3 ~/shared/scNanoGPS/reporter_expression.py -t 24 --gtf ~/shared/scNanoGPS/genes.gtf --featurecounts $(which featureCounts) \
-d ${working}/${sample}/ngene_test/ --min_gene_no 250 -o 250_matrix.tsv --log 250_reporter_expression.log.txt --sel_bc_o 250_filtered_barcode_list.txt

python3 ~/shared/scNanoGPS/reporter_expression.py -t 24 --gtf ~/shared/scNanoGPS/genes.gtf --featurecounts $(which featureCounts) \
-d ${working}/${sample}/ngene_test/ -o def_matrix.tsv --log def_reporter_expression.log.txt --sel_bc_o def_filtered_barcode_list.txt (def means default)

and, I got filtered cell numbers like below, which seems very weird:

Initial cell number = 4816
--min_gene_no 1 --> filtered cell number: 4816
--min_gene_no 10 --> filtered cell number: 4816
--min_gene_no 50 --> filtered cell number: 1242
--min_gene_no 250 --> filtered cell number: 2112
--min_gene_no false (maybe 300?) --> filtered cell number: 4058

In my assumption, if min_gene cut is low, the number of filtered cells should high, but for 50 and default, they seems to be out of the trand.

Also, when I analyzed them with Seurat, I found some weird plots like below:

image

It appears the matrix.tsv file have problem for --min_gene_no 50 & 250 (gene_cut_50&gene_cut_250) when see the result.

To clarify, I ran these two times in the same run and did another run to separately run them with different nodes, but the results were always same.

Could you help me to find the reasons of these problems of this code?

Best,

Dongin

shiauck commented 4 months ago

Hi Dongin,

Thanks for the feedback. Looks like that the data type got wrong. I'll review the code.

Regards, Cheng-Kai