Closed ahmadalajami closed 3 months ago
I use this awk script to find sequences that exist in hla_nuc.fasta but not in hla.fasta
awk ' /^[^>]/ {if(p==1) print; next} FNR == NR { G[$1] = 1; next } $1 in G { p=0; next} {print substr($1,2); p = 1; G[$1]=1 }' hla.fasta hla_nuc.fasta >HLA.exonOnly_nuc.id
For RNA-seq you should just use hla_nuc.fasta
Hello Ahmad, thank you for you're query. As discussed elsewhere including our FAQs, there are indeed differences between the number of alleles included in the hla_nuc.fasta and hla_gen.fasta. This is due to partial sequences with only exons which are included in the hla_nuc.fasta but not the hla_gen.fasta. With regards to how these files are used with something like scRNA-seq data you will need to seek support from the source of this dataset or sequencing/software provider you are using.
Best,
Dominic
Hi there,
I am trying to quantify a particular allele in a
scRNA-seq
dataset. I found this alleleA*03:04:01
inAllelelist.txt
andhla_nuc.fasta
, but not inhla_gen.fasta
. Which sequence do you suggest using when quantifying_nuc
and_gen
fasta files_nuc
fasta fileCheers, Ahmad