GWW / scsnv

scSNV Mapping tool for 10X Single Cell Data
MIT License
22 stars 4 forks source link

Suggest Reference #22

Closed Alf-Kenz closed 1 year ago

Alf-Kenz commented 1 year ago

Hi,

Really nice work, I'm excited to try it on some of my data!

For normal human data, can you give advice on the exact reference you suggest/use in the paper? Would you just pull the most recent primary assembly e.g. https://ftp.ensembl.org/pub/release-109/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz? As I load it in, this logs a lot about alt strands (which this shows 0), should I use a different reference instead

And to confirm, I should run both scsnv index as well as bwa index. When I was doing it for the primary assembly, it was taking a very long time, and I saw that illumina also hosts pre-indexed versions here

GWW commented 1 year ago

Hi,

That is indeed the reference fasta file I have been using with scsnv for my projects.

The scsnv index command indexes the transcriptome rather than the genome (it uses this for spliced read counts), while the full genome is used for unspliced / intergenic read counts. You can use a pre-created BWA index for the bwa index portion as long as it's to the same genome fasta file.

The alt warnings are from BWA when running scsnv index. I am not sure why you would be receiving them if there aren't any alts present.

The nice part is once these indices are generated you never have to use them again.

Gavin

Alf-Kenz commented 1 year ago

Hi

Wowow such a fast response, thanks so much. To triple check (feel free to not respond since you answered above, I'm just planning on putting a lot of compute time into this, so wanted to be certain), I was talking about the Alt contigs logging here. As I was reading through the code, it seems like the alt contig handling comes up a moderate amount, so wanted to make sure the reference I was using had all the parts scsnv could use for the highest accuracy

GWW commented 1 year ago

HI @Alf-Kenz,

I don't specifically do anything with the ALT contigs during the alignment. But I don't think it should give any issues. If you have some problems let me know and I will try to fix them. I am working on making scSNV faster; particularly the read collapsing part. I just haven't had enough free time to work on it.

Let me know if you have any other issues and I am happy to help

Gavin

Alf-Kenz commented 1 year ago

Amazing all around, thanks very much for such a fast and helpful response!

I might start running it on many hundreds of of samples, so any speedups would be great :-)