aakechin / NGS-PrimerPlex

NGS-PrimerPlex is a high-throughput tool for mupltiplex primer design
GNU General Public License v3.0
54 stars 21 forks source link

Gen Bank files for hg38, separated by Chromose #25

Closed kiavash17 closed 2 years ago

kiavash17 commented 3 years ago

Hi,

Thank you very much for sharing this great tool. I have used the linux version with hg19 and it works very well. For my application I need to use hg38. I followed the instructions and have downloaded hg38.2bit, converted to FASTA, and indexed with BWA. I would like to automatically extract genome regions and for this I need GenBank-files for each of chromosome. Can you please help me to find these files for hg38? I was only able to find: GCA_000001405.28_GRCh38.p13_genomic.gbff from: ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.28_GRCh38.p13/GCA_000001405.28_GRCh38.p13_genomic.gbff.gz

I'm wondering how I could divide this large file (for entire human reference genome) into individual chromosome files, or whether I can download files divided by chromosome from another source.

Thank you.

aakechin commented 3 years ago

@kiavash17 Thank you for the question! Have you looked at Wiki pages? https://github.com/aakechin/NGS-PrimerPlex/wiki/NGS-PrimerPlex-installation-in-linux-as-a-standalone-tool. There is a detailed manual on how to prepare hg38 GenBank-files. Do not hesitate to contact me again, if some questions still remain.

kiavash17 commented 3 years ago

Thank you so much @aakechin! I had missed the wiki page and it was indeed very helpful. I have now downloaded all the GenBank-files and they are named as required. After attempting to design primers or run test.py I can see that the script tries to create the geneNameToChromosome.csv in the reference directory. However the created table is empty and has size 0 kb. So when the script tries to find any gene in the table I get this error: ERROR (14)! The following gene was not found in the reference genome:

KRAS

To check if the issue also happens with hg19, I tried recreating the geneNameToChromosome.csv for hg19 with hg19 GenBank-files and that ran correctly and I was able to create the correct genNameToChromosome.csv in that case.

Thanks again for your help.

aakechin commented 2 years ago

@kiavash17 Could you check your GenBank files e.g. with less command? The getGeneRegions.py script searches for 'gene' features. Do the files contain such GenBank features?