Closed CTLife closed 7 years ago
If you're trying to map to a whole genome I would point you to COMBINE-lab/salmon#49 or the README (TLDR: RapMap is not for genomic alignments). If you have reads from cDNA, hybrid capture, or amplicons I would suggest making a GTF or GFF feature file using either transcripts, capture regions, or target amplicons and using gffread
to make a reduced FASTA. If you have a BED file with your regions of interest you could also use mdshw5/pyfaidx since it includes a cli script to subset and filter FASTA files.
To answer your original question then: if the features you are sequencing from are not placed in the larger chromosome sized contigs then you'll want to keep the scaffold sequence before you subset the FASTA file before indexing with RapMap.
OK, thank you.
Hi, I downloaded reference genomes from Ensembl (fasta format). But there are lots of sequences with name "dna:scaffold": https://github.com/CTLife/TEMP/tree/master/RefGenomes
Such as Mouse_GRCm38 (mm10), except chromosome 1-19, Mt, X and Y; others should be removed before mapping ? https://github.com/CTLife/TEMP/blob/master/RefGenomes/Mouse_GRCm38.p4.txt
Such as Human_GRCh38.p5 (hg38), https://github.com/CTLife/TEMP/blob/master/RefGenomes/Human_GRCh38.p5.txt, there are 516 sequences. In addition to chromosome 1-22, Mt, X and Y; others (such as CHR_HG2241_PATCH and KI270728.1) should be removed before mapping ?