Screadcounts output file

juniajvs commented 11 months ago

Hello,

Thank you so much for creating this great package. Unfortunately, when typing the output file i used .txt instead of the other recommended extension types. Will that be a big issue?

My analysis have been running for over 3 days, but i can see that it is progressing and i believe it might be an issue related to the previous post that related that he was using only one core on his analysis. Is there a way to increase the number of cores used?

Thank you so much! Junia

edwardsnj commented 11 months ago

Hi @juniajvs, thanks for your interest in scReadCounts. Using .txt as the output extension should not be a big issue, it will just change the output format to the "Text" output format, which is essentially "space" separated on each line. The readCounts output also does not have headers in Text format, but the matrix files do. You should see partial output in the *.txt file with sorted loci, this can give you a sense of how much is left to go.

You can use the Advanced option "Threads" (-t) to indicate the number of CPUs that scReadCounts should (try to) use - and then check using "top" or similar. Each thread will appear as a separate process - and ideally each should show as 100% CPU utilization. If you access the BAM file over the network however, the I/O may be slow and the CPU utilization may be less than 100% for each thread.

How are you running scReadCounts? How many loci? How big is your BAM file?

This is a big compute. We are working on ways to speed it up, but we are not quite there yet. I did some testing based on the the previous issue and couldn't reproduce the problem. The user never got back to me to indicate whether they were still having an issue.

juniajvs commented 11 months ago

Thank you for your reply, Edward. I am running it on the command line. I have used the discovery function from varLoci, so it gave me up to 10million loci. My BAM files are have from 10GB to 30GB. How many threads do you recommend for the command line option?

juniajvs commented 11 months ago

I am also having another issue. Now, that the run is finished the output files are empty. This is the command line I used for the varLoci function: /data1/software/SCReadCounts-1.3.2.Linux-x86_64/bin/varLoci /data1/datasets/screadcounts/sample_1251_possorted_genome_md_bam.bam 2 > /data1/datasets/screadcounts/output/sample_1251_snv_loci.txt and this is the function i used for the screadcounts analysis using the varLoci file: /data1/software/SCReadCounts-1.3.2.Linux-x86_64/bin/scReadCounts -r /data1/datasets/screadcounts/sample_1251_possorted_genome_md_bam.bam -s /data1/datasets/screadcounts/output/sample_1251_snv_loci.txt -o /data1/datasets/screadcounts/output/sample_1251_screadcount_analysis_t20.csv -t 20 would you be able to give me some insight on this issue? Thank you!

edwardsnj commented 11 months ago

There isn't really any way for me to understand why might be happening without more information. Do you have 20 cpus on your machine? If not, setting -t 20 is likely to be counter productive. What did you set the barcode option(s) to? Does it match your data? What if you select a small subset (<100) of the snv_loci rows? Does it run correctly? What does top show you about the memory usage of scReadCounts (readCounts) when it is running? Is there enough? Any error messages or other diagnostics in the log output?

juniajvs commented 11 months ago

Is the barcode option -C? Could you explain exactly how should I do it? In the options it says: -C CELLBARCODE, --cellbarcode=CELLBARCODE Group reads based on cell-barcodes extracted from read name/identifiers or BAM-file tags. Options: CellRanger (Cell barcodes from the CB tag of aligned read - reads without a CB tag or with CB tag not in the accept list (default: file "barcodes.tsv" in the current directory) dropped), STARsolo (Cell barcodes from the CB tag of aligned read - reads without a CB tag or with CB tag not in the accept list (default: file "barcodes.tsv" in the current directory) dropped), UMI-tools (Cell barcodes from read name added by umi_tools). Default: UMI-tools.

I have cellranger barcodes, but I am not sure if I should just do -C {file path} or -C CELLBARCODE and set the working directory to the same folder that has the barcode file. Which option is the correct option?

Thank you! 😊

edwardsnj commented 11 months ago

If you have CellRanger barcodes then you would use -C CellRanger to indicate this. If you wish to use a barcode accept list, you can specify this using the -b <barcodes.tsv> option, where <barcodes.tsv> is replaced with the name of your barcodes file.

juniajvs commented 11 months ago

I am running it again with the -C CellRanger option. One more question: do you recommend any integration tools to use with the screadcount output? Are there any packages that can help with data visualization? Thank you so much!

HorvathLab / NGS

Screadcounts output file #17