Open kendrasc opened 5 months ago
Right now it expects that the provided fasta file only includes the chromosome of interest. If your fasta file includes the whole reference, the easiest way is to extract chromosome 11 to a different file and use that instead.
This should fix the issue for you, but I will keep in mind that it might be a good idea to change it to allow using files with the whole reference. Thanks for feedback and let me know if there are any other issues!
Thanks for the quick response! How might I change it to allow using files with the whole reference?
I can make some changes in the code and let you know when the code has been updated. Could you give me an example of the format for the sequence id / header row of the reference fasta file you use? However, if you don't wish to wait for the update (I will try to get this added asap), right now the fastest solution for you is still to extract chromosome 11 (either with grep, samtools or something else), write it to a separate file and use that.
Using the glistmaker, I managed to create a .list file using a fasta file for the HG38 reference genome. While I am able to make .db files using this .list file, and while said .db files say they are located on my chromosome of interest (11), the actual kmers seem to all be actually on chromosome 1. The output gene sequence is also from chromosome one.
What might be going on here?
I used the following scripts:
glistmaker HG_38.fna -w 25 -o HG_38
python GeneToKmer.py Genes.txt HG_38.fna HG_38_25.list -o Genes_database/ -i -gt ${my_directory}/GenomeTester4/src/
And my gene locations were:
PGA3 G 11:61203515-61213098 PGA4 G 11:61222347-61231694 PGA5 G 11:61241175-61251444