Open marwaaswa opened 4 years ago
-downloading of Index of /pub/CCDS/current_human from the link: https://www.ncbi.nlm.nih.gov/projects/CCDS/CcdsBrowse.cgi?REQUEST=ALLFIELDS&DATA=&ORGANISM=9606&BUILDS=CURRENTBUILDS ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/current_human ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/current_human/CCDS_nucleotide.current.fna.gz -but there are issues 1-issue with indexing , as formation of 5 files instead of 6 files (CCDS_nucleotide_current.fna.fai not found) CCDS_nucleotide_current.fna.amb CCDS_nucleotide_current.fna.ann CCDS_nucleotide_current.fna.bwt CCDS_nucleotide_current.fna.pac CCDS_nucleotide_current.fna.sa 2- the second issue , when i ran the command of pwa (splice non -aware alighnment) . the produced sam files contains only headers
-searching for reference exome (CDS) as a reference for our exome sequence data due to several reasons: 1- when we aligned to one chromosome like chr 20, the alignment rate was v.v low 2.8% 2- when we aligned to combined chromosomes like chr 3,7.11&20 which publishesd in this project, our computers crushed (due to our limited computational resources ) 3- when we aligned to another chr like chr 11 to improve alignment rate ,we downloaded it .but indexing takes over 8 hours and finally not completed files , the number of indexed files are 6 files instead of 9 files it is failed to create Homo_sapiens.GRCh38.dna.chromosome.11.5.ht2 Homo_sapiens.GRCh38.dna.chromosome.11.6.ht2 Homo_sapiens.GRCh38.dna.chromosome.11.rf -really , i know that using whole genome as a reference is better than using coding sequence as a reference to avoid loss of important data in flanking regions , but doing this trial to avoid high computational resources requirements for downstream analysis in case of using whole genome as a reference